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Abstract 

Characterizing quantum correlations in terms of information-theoretic princi¬ 
ples is a popular chapter of quantum foundations. Traditionally, the principles 
used for this scope have been expressed in terms of conditional probability dis¬ 
tributions, specifying the probability that a black box produces a certain output 
upon receiving a certain input. This approach is known as device-independent. 
Another major chapter of quantum foundations is the information-theoretic 
characterization of quantum theory as a whole, with its sets of states and mea¬ 
surements, and with its allowed dynamics. The different frameworks adopted for 
this scope are known under the umbrella term of general probabilistic theories. 
With only a few exceptions, the two research programmes on characterizing 
quantum correlations and characterizing quantum theory have so far proceeded 
on separate tracks, each one developing its own methods and its own agenda. 
Still, both programmes share the same basic goal: a new and better understand¬ 
ing of quantum mechanics in information-theoretic terms. This considered, it is 
quite striking that the connections between the two programmes are still largely 
undeveloped. This paper aims at bridging the gap, by presenting a “Rosetta 
stone” for the two frameworks and by illustrating how the two programmes can 
benefit each other. 

As a case study, we focus on two device-independent features known as Lo¬ 
cal Orthogonality (LO) and Consistent Exclusivity (CE). In a recent work [I], 
we showed that CE and LO can be derived from the basic idea that, at the 
fundamental level, measurements are repeatable and minimally disturbing. In 
this paper we provide a new, alternative derivation based on a different set of 
principles, revolving around the notion of pure orthogonal measurement —a mea- 
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surement that cannot be further refined and that identifies states without error. 
The first principle, Measurement Purification, states that every measurement 
can be reduced to a pure orthogonal measurement by adding an auxiliary system 
and by coarse-graining over some outcomes. The second principle, Locality of 
Pure Orthogonal Measurements, states that two pure orthogonal measurements 
performed independently on two systems yield a pure orthogonal measurement 
on the composite system. The third principle, Strong No Disturbance Without 
Information, states that every measurement that does not extract information 
about a source can be realized without disturbing the states in that source and 
without disturbing the pure orthogonal measurements that identify states in 
that source. These three principles together imply LO. CE is then derived by 
adding a fourth principle, called Pure State Identification, stating that every 
outcome of a pure orthogonal measurement identifies a pure state. 


1. Introduction 


One of the most profound mysteries of quantum theory is nonlocality Hi, 
namely the fact that experiments performed at spacelike separated locations 
can exhibit stronger correlations than those allowed by any local realistic model. 
Still, quantum correlations are not the strongest correlations one can imagine: 
the assumption that correlations cannot be used to communicate at unbounded 
speed, known as No-Signalling, is compatible with a larger set of exotic, non¬ 
quantum correlations 0, Q. This observation stimulated the search for other 
principles, of similar information-theoretic flavour, aimed at achieving a com¬ 
plete characterization. Up to now, several principles for quantum correlations 
have been proposed, such as Non-Trivial Communication Complexity @, [j], 
No-Advantage in Nonlocal Computation }§|, Information Causality [9|, Macro¬ 
scopic Locality 0, and Local Orthogonality (LO) [llj |. These principles have 
been spectacularly successful in constraining the allowed correlations, narrow¬ 
ing them to a set that is close to the quantum set. However, no combination of 
the presently known principles is sufficient to characterize the quantum set com¬ 
pletely fud . Similar considerations apply to the study of quantum contextuality 
[IlH4j. where the principles of Consistent Exclusivity (CE) [0[00,|T8j and 
Macroscopic Non-Contextuality j3| have been proposed in order to characterize 
the degree of contextuality exhibited by projective measurements in quantum 
theory. Also in this case, a complete information-theoretic characterization of 
the contextuality bounds satisfied by projective quantum measurements is still 
missing. 

On the other hand, several reconstructions of quantum theory from information- 
theore tic p rinciples have been proposed in recent years (20:, 0 , 23, 00 , 

0, 27, 0| . With different background assumptions and slightly different goals, 
these works single out the Hilbert space formalism of quantum theory: in par¬ 
ticular, they imply that physical systems are associated to Hilbert spaces, that 
states are described by density matrices, and that the probabilities of measure¬ 
ment outcomes are computed with the Born rule. As a byproduct, they also 
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characterize the particular sets of correlations arising in quantum theory. Is this 
a solution to the long sought-after characterization of quantum nonlocality and 
contextuality? Yes and no. Yes , because every set of information-theoretic prin¬ 
ciples that singles out quantum theory provides also an information-theoretic 
justification of quantum nonlocality and contextuality. And no 7 because such a 
justification may not be as satisfactory as one may desire: ideally, one would 
like to have principles that directly imply bounds on correlations, without the 
detour of a full derivation of the Hilbert space framework. 

The importance of a direct characterization is reflected in the nature of 


the principles used for quantum correlations. Principles like those in Refs. 

EafliaE.fliaaiia 11, IB 16 > 11 > HI refer only to the conditional probabili¬ 
ties of obtaining output data from input data, without making any assumption 
on the process generating the output from the input. The framework in which 
these principles are formulated has been aptly named device-independent (see 
[30] and 31 for an introduction). In stark contrast, the framework used for re¬ 
constructing quantum theory is not device-independent. And for good reasons: 
a full-fledged physical theory is not only about input-output probability distri¬ 
butions, but also about physical systems and how they can interact through 
physical processes [32]. Naturally, the principles used for reconstructing quan¬ 
tum theory presuppose that experimental data have already been organized in 
the basic structure of a physical theory, which has systems, states, transforma¬ 
tions, and measurements at its backbone. For example, principles like Local 
Tomography [24], 0, 0] or Ideal Compression 0] explicitly refer to the “states 
of a given physical system”, to the “measurements performed on a composite 
system”, and to the “processes that transform a system into another”. The 
formulation of these pri ncip les is based on the framework of general probabilis¬ 
tic theories 0 0, m m 0, 0, 0 s 0, 0 0 , which describes on the 
same footing classical and quantum theory, as well as many hypothetical, post¬ 
quantum theories. 

Up to now, the research programmes on reconstructing quantum theory 
and that on characterizing quantum correlations have proceeded along separate 
tracks. However, it is clear that the interaction between these two programmes 
has a potential for understanding the picture of reality (if any) at which quan¬ 
tum theory is hinting. In a recent work [l|, we started exploring the relations 
between principles for correlations and principles for general probabilistic theo¬ 
ries. We first defined a class of ideal measurements, called sharp measurements , 
which represent an ideal standard of measurements that are repeatable and 
cause minimal disturbance on future observations. Then we postulated that all 
measurements are fundamentally sharp, i.e. they can be obtained by perform¬ 
ing a joint sharp measurement on the system together and on an environment. 
Combining this requirement with two additional requests about the composi¬ 
tional properties of sharp measurements we have been able to derive the validity 
of Local Orthogonality and Consistent Exclusivity. 

In this paper we present an alternative derivation of LO and CE, based 
on a different notion of “ideal measurement” and on a different set of physi¬ 
cal principles. We take LO and CE as the subject of this further study because 
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they provide the simplest testbed for investigating the interplay between device¬ 
independent and device-dependent notions. We will first discuss the relations 
between the device-independent framework and the framework of general prob¬ 
abilistic theories. Then we will set up the scene for the derivation of LO and 
CE, defining a privileged class of measurements, here called spiky measurements. 
Spiky measurements are obtained by coarse-graining pure orthogonal measure¬ 
ments, i.e. measurements that cannot be further refined and that identify some 
states without error. In quantum theory, spiky measurements coincide with 
projective measurements, which in turn coincide with the sharp measurements 
defined in Ref. [lj. In a general theory, however, spiky and sharp measurements 
can potentially differ. Using the notion of pure orthogonal measurement, we 
then formulate three requirements which allow one to derive LO and, under 
an additional assumption, CE. Interestingly, our requirements do not include 
Causality 37], which is instead derived as a byproduct. Nevertheless, one of 


the requirements, called Sufficient Orthogonality, has no immediate operational 
interpretation. To address this issue, we show how Sufficient Orthogonality 
can be reduced to a strong version of the No Information Without Disturbance 
property discussed in Refs. l4lj |. Such a reduction, alas, assumes Causality. 

The paper is structured as follows: in Sections [5] and [3] we present the device¬ 
independent framework and the framework of general probabilistic theories, re¬ 
spectively. The bridge between the two frameworks is provided in Section [I] 
where we specify the physical model in which the input-output probabilities 
are generated. The relation between No-Signalling at the level of probability 
distributions and Causality at the level of physical processes is discussed in 
Section [5] Sections © and [7] provide the definition of spiky measurements and 
state three axioms about their structure. The three axioms imply the validity 
of Local Orthogonality (briefly recalled in Section [3J and Causality, as shown 
by the derivation in Section© Since one of the three axioms has no compelling 
operational interpretation, in Section [TO] we show one way to reduce it to more 
fundamental physical statements— in this case, Causality and a strong version 
of the No Information Without Disturbance principle. The analysis carried out 
for nonlocality is then applied to the study of contextuality: we first review 
the device-independent framework and illustrate CE as an example of device¬ 
independent principle (Section 1111) and then bridge it with the framework of 
general probabilistic theories (Section [12]). In Section [13] we show different for¬ 
mulations of CE as a physical principle regarding a privileged class of measure¬ 
ments, which generalize projective measurements in quantum theory. Choosing 
spiky measurements as our privileged class of measurements, we then provide 
a derivation of CE ('Section ITdl) . Finally, in Section m we compare the notion 
of spiky measurement, used in this paper, with other potential generalizations 
of the notion of projective measurement in quantum theory. The conclusions 
are drawn in Section Hi The Appendices present some technical proofs that 
are not of immediate interest for the comprehension of the main points of the 
paper. 
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2. The device-independent framework for nonlocality 


In this section we briefly review the device-independent framework for non¬ 
locality @,[42, 431, pointing the reader to [30] for a more extended discussion. 
The framework describes games where a group of players respond to a set of 
possible questions posed by a referee. The strategy of the players is described 
by the conditional probability distribution of the answers given the questions. 
Regarding the questions as inputs and the answers as outputs, the limitations 
on the physical theory that describes the players’ strategies are encoded into 
limitations on the allowed input-output probability distributions. 


2.1. Non-local games 

A non-local game is a game involving N players and a referee, where the 
referee gives to the *-th player an input Xi in some input alphabet X 2 and the 
player returns an output yi in some output alphabet Y,. For brevity, let us 
denote by x = (x\,x 2 ,... ,Xn) 6 Xi x X 2 x ■ • ■ x X^ =: JI^Li X* the string of all 
inputs given by the referee and by y = {y\, y 2 ,..., yjy) G Yi x Y 2 x • • • x Yw =: 
JXfli Y i the string of all outputs returned by the players. In each run of the 
game, the referee chooses the input string x at random according to a probability 
distribution g(x) and assigns a payoff w(x, y) to the output string y. The goal 
of the players is to maximize their expected payoff, given by 


w = Yl «( x ) 

X 


5Z w (x,y) p(y|x) , 

. y 


(i) 


where p(y|x) is the conditional probability that the players produce the output 
y upon receiving the input x. 


2.2. Principles about input-output distributions 

The input-output probability distribution p(y|x) describes the strategy of 
the players in a black box fashion, disregarding the specific details of the devices 
used to generate the outputs. Such a description, called device-independent , is 
particularly suited for cryptographic applications 


M, 45L 42, 43, 461 43, A 


In this context, the constraints on the allowed strategies are expressed as con¬ 
straints on the allowed probability distributions For example, the most common 
constraint in the literature is the No-Signalling principle [29j,[4j,[5] , which imposes 
that the correlations in the probability distribution p(y|x) cannot be used to 
simulate classical communication among the players. For N = 2, No-Signalling 
amounts to the set of linear constraints 


^2 p(yi^y2\xi,x 2 ) 
1/2 £Y 2 

^2 p(yi^y2\xi,x 2 ) 

2/1 eYi 


^2 p(yi,y2\xi,x' 2 ) Vx 2 ,x' 2 &x 2 

2 / 2 GY 2 

^2 p(yi,y2\x'i,x 2 ) Vxi,xi e x x . 

2/l6Yi 


( 2 ) 
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For N > 2, no-signalling is imposed by partitioning the N players into two 
disjoint groups and by imposing the above equations for all possible bipartitions. 

Principles like Non-Trivial Communication Complexity @,0|) No-Advantage 
in Nonlocal Computation Q, Information Causality Q, Macroscopic Local¬ 
ity [10] and Local Orthogonality El are also examples of restrictions about 
input-output probability distributions. For example, Non-Trivial Communi¬ 
cation Complexity is the requirement that the probability distribution p(y|x) 
should not allow two players to compute arbitrary Boolean functions with a 
single bit of classical communication. 

Treating p(y|x) as a black box also allows for an interesting connection with 


the framework of interactive proof systems |49|, as highlighted in Ref. |50j, 511. 
In short, one regards the N players as N untrusted provers and the referee 
as a verifier, with the communication between provers and verifier restricted 
to be classical. In this context, different physical principles represent different 
constraints on the power of the provers. Starting from the seminal work by Raz 
(52], the no-signalling constraint has been studied extensively 5^, [HU, 5EJ, 561. 
It is natural to expect that also other information-theoretic principles, such as 
Non-Trivial Communication Complexity or Information Causality, may have 
interesting consequences for interactive proof systems. 


2.3. Characterizing quantum correlations 

The original scope of nonlocal games is the study of quantum correlations. 
Here one imagines a scenario where N parties prepare N quantum systems in a 
joint quantum state and the i-th party generates the output yi by performing a 
measurement on the t-th system, choosing the measurement settings according 
to the input aIn this scenario, the probability distribution p(y|x) has the 
form 


p(y|x) = Tr (p£ 


(MO 


1 p(MO 
V2 


, P( N ’ X ”)\ 


(3) 


m } 


(Mi) 


J/i€Yi 


is the Positive 


where p is the quantum state of the N systems and |Pj 

Operator-Valued Measured (POVM) describing the measurement performed by 
the i-th party upon receiving the input Xi. 

Input-output distributions that are generated as in Eq. m are called quan¬ 
tum. For given N and given input/output alphabets, the set of quantum input- 
output distributions is convex. Hence, characterizing it is equivalent to char¬ 
acterizing the maximum payoffs achieved by strategies of the form ([3]) in all 
possible games. Since the payoff w in Eq. CD can be viewed as a correlation , 
the problem of characterizing the maximum payoffs is often referred to as the 
problem of characterizing the set of quantum correlations. 


2 We recall that a POVM with outcomes in Y is defined as a collection of non-negative 
operators {Py} y £Y satisfying the normalization condition V I||CV P y = I, I being the identity 
on the system’s Hilbert space. 
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Clearly, the definition of “quantum input-output distribution” does refer to 
the way the probability distribution is generated. It does in two ways: 

1. it prescribes that the p(y|x) is generated by a specific operational proce¬ 
dure (preparing a multipartite state and performing local measurements) 

2. it specifies the physical theory (quantum theory) in which the procedure 
is implemented. 

Now the question is: Can we characterize the set of quantum correlations though 
device-independent principles? The principles proposed so far SSSIIES 
El are important milestones towards the achievement of this goal. However, 
a complete characterization of the quantum set solely in terms of conditional 
probability distributions appears to be challenging |12| . 


3. The framework of operational-probabilistic theories 


Ultimately, the maximum payoff that the players can win in a non-local game 
depends on the physical theory that underlies its implementation. Constraints 
on the physical theory imply constraints on the conditional probability distri¬ 
butions p(y|x) that the players can generate. For example, the no-signalling 
conditions of Eq. m are often motivated by a space-time scenario where phys¬ 
ical systems travel at a bounded speed and the players are far enough from one 
another that no signal can be exchanged among them during a run of the game. 

Among all possible theories, classical and quantum theories are the two 
prominent examples, due to their central role in physics. However, in or¬ 
der to understand what is specific about these two theories and to explore 
future generalizations, it is convenient to step back from their specific de¬ 
tails and to place them in the wider context of general probabilistic theories 

3 H13 EH E Ezl 0, El El EH Eq| — see also the contributed volume [57 *] 

for an introduction to the different frameworks. Among the available frame¬ 
works, here we adopt the framework of operational-probabilistic theories (OPTs) 
[3 . EH [ioi, EU, which extends the language of quantum circuits [0, [0] to 
arbitrary physical theories, combining the categorical framework initiated by 
Abramsky and Coecke (61) 0, IE with the toolbox of elementary probability 
theory. An informal summary of the OPT framework is provided in the fol¬ 
lowing subsections. For a more formal exposition we direct the reader to Ref. 
jioj ]- For more discussion on the physical assumptions at the basis of the OPT 
framework we recommend Hardy’s recent works m m E3, which adopt a 
closely related framework and provide a number of enlightening comments on 
the relation between the operational and the theoretical level. 


200 3.1. Operational structure 


An OPT 37] describes the operations that an agent can perform on physical 
systems. The theory specifies a catalog of (generally non-deterministic) devices 
that the agent can compose with each other: each device transforms an input 
system into an output system, generally in a stochastic way, producing a random 
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outcome x. We denote by T = {T x }xex the set of alternative transformations 
that can occur when a given device is used, and we represent each transformation 
'Tx 

A 



where A and B are the input and output system, respectively. A collection 7~ 
describing the action of a device is called a test [37 L Whether or not a given 
collection T is a “test” is determined by the theory Q 

A test with input A and output B is said to be of type A -A B. As a special 
case, the device can have no input, in which case its action consists in preparing 
a system in a particular ensemble of states @ p = {px} x eA- Each state of the 
ensemble is represented as 


where B is the system prepared by the device. In equations, we will often use 
the Dirac-like notation | p x ). Likewise, a device can have trivial output, in which 
case its action results in a demolition measurement m = {m x } X £x, that absorbs 
the system and produces an outcome with some probability. We represent each 
transformation in the measurement as 




where A is the input state undergoing the measurement. In equations we will 
often use the Dirac-like notation (m x \. Traditionally, the transformation m x is 
called effect [64]. A test T of type A —> A can be thought as a non-demolition 
measurement of system A. We will use the notation St (A), Transf(A — > B ), and 
Eff(I?) to denote the sets of all states of system A, all transformations of A into 
B , and all effects on system B , respectively. 

The simplest device that can act on a system A is the identity device, which 
has only one possible outcome, corresponding to the identity transformation, 
1a■ Like in quantum circuits, we represent the identity on system A with just a 
wire. In general, we call a device with a single outcome deterministic , because 
in that case we know for sure which transformation is going to take place. The 
subsets consisting of deterministic states, deterministic transformations, and 
deterministic effects will be denoted as St!(A), Transf^A —> B) : and Eff 1 (B), 
respectively. 

The notation A (g) B represents the composite system consisting of the sub¬ 
systems A and B. Composite systems are represented by multiple wires: for 
example, 


3 Essentially, the only constraints on the set of tests are those arising by coarse-graining and 
by composition (see discussion later in this section) For example, if two tests are composed 
in series or in parallel, then the resulting collection is also a test. 

4 In quantum theory, the ensemble {px DeX would consist of unnormalized density matrices, 
with the trace of each matrix giving the probability of the corresponding preparation. 















represents a state of the composite system A® B. Devices can be connected in 
parallel and in series, giving rise to circuits, such as 



or, in equation (2p (g> M z )(J~ y ®Xc)p x . Circuits in an operational-probabilistic 
theory obey the same rules as circuits in quantum information. In fact, these 
rules are already encapsulated in the graphical language used to represent them, 
whose foundation lies in the theory of strict symmetric monoidal categories 
The idea that the definition of a physical theory should be based on 



strict] symmetric monoidal categories was introduced by Abramsky and Coecke 
A discussion of this idea, along with a comprehensive exposition of the 
categorical framework can be found in Coecke’s review [32| . 


3.2. Probabilistic structure 

When a preparation device with ensemble {/5 a ,} x gx is connected to a mea¬ 
surement device {m y }the joint probability distribution of the outcomes is 
written as 

p(x,y) = {m y \p x ), (4) 

and is identified with the diagram 


A 

Trim 




0 - 


Note that this is the joint probability distribution that the preparation device 
gives the random outcome x and the measurement device gives the outcome y. 
Accordingly, it is is normalized as 

=1 ' 

xeXye y 


Once probabilities are introduced, the sets of states, effects, and transformations 
inherit a linear structure^, so that we can think of each state, effect, or trans¬ 
formation as an element of a suitable vector space 37, [4Cj. By construction, 


the action of a transformation on states and effects is linear and, in particular, 
states (effects) are linear functionals on effects (states) (see paragraph ILF of 
Ref. [13] and paragraph 2.3.3 of Ref. [40| for details). 

Quantum theory can be cast in the framework of OPTs as a special example. 
Here systems are described by Hilbert spaces. A preparation device is described 


5 The linear structure is obtained through an operation of quotient, which consists in iden- 
tifying transformations that give the same probabilities in all possible circuits. Note that the 
assumption of convexity is not made here: the OPT framework can also be used to describe 
theories with non-convex state spaces, such as Spekkens’ toy theory [67) . 
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by an ensemble {px]xeX of unnormalized density matrices, acting on the sys¬ 
tem’s Hilbert space and satisfying the condition Tr[p x ] = 1. A measurement 
device is described by a positive operator-valued measure (POVM), namely a 
collection {P y } y& y of non-negative operators satisfying the condition 

J2 p y = I ’ (5) 

y 

where / is the identity operator on the system’s Hilbert space. The pairing 
between states and effects is given by the Born rule 

(Py\Px) :=Tr[Pyp x ]. (6) 

A test with non-trivial input and output is a quantum instrument |68| , i. e. a col¬ 
lection of completely positive, trace non-increasing linear maps {Ty} y eY, trans¬ 
forming operators on the input system’s Hilbert space into operators on the out¬ 
put system’s Hilbert space and satisfying the condition that the map T y 

is trace-preserving. Classical theory can also be represented in this way, by 
choosing density matrices and POVM operators that are diagonal in a fixed ba¬ 
sis, and quantum instruments that transform diagonal operators into diagonal 
operators. 

3.3. Coarse-graining 

A key notion that comes with the probabilistic structure is the notion of 
coarse-graining: for a test 7~ = {T y } y ^y one can decide to identify some out¬ 
comes, thus obtaining another, coarse-grained test. Mathematically, a coarse- 
graining is defined by a partition of the outcome set Y into mutually disjoint 
subsets {Y z } z gz- The coarse-grained test is the test T' = {77} z ez defined by 

V := £ T y . (7) 

y&Yz 

Note that the summation is well-defined because transformations are elements 
of a vector space (cf. paragraph II.F of Ref. [37] and paragraph 2.3.3 of Ref. 
0 )- 


4. Physical modelling of non-local games 

The OPT framework can be naturally applied to the study of nonlocal games. 

A strategy in a nonlocal game can be modelled as follows: 

1. The correlations shared by the N players are modelled by a joint state p 
of N systems S i: i = 1,... ,1V, with system Si in possession of the i-tli 
player. Here we restrict the attention to states p that can be prepared 
deterministically, that is, to states generated by a preparation device with 
only one possible outcome. 
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2. Upon receiving the input Xi from the referee, the i-th player will produce 
an output by performing a measurement on system Si. Note that in this 
broad context, “measurement” can be any process that produces a classical 
output yi given the input Xi and the state of the system. Even evaluating 
a function of Xi on a computer and reading the result on the screen would 
count as a “measurement”. 

Let us denote by m l,Xi := { m yf i } y . eY . the measurement performed by the i-th 
player upon receiving input Xi. The conditional probability distribution p(y|x), 
generated by the measurements of all players is then given by 

P( y|x) = Kf 1 (8 m 2 y ’ X2 <8 • • • <8 m. y ff N \ p) , (8) 

and corresponds to the diagram 



(9) 


For brevity, we will often use the notation to* to denote the product effect 

TOy := TO*f 1 8 m 2 y ’ x 2 8 • ■ • 8 . (10) 

Accordingly, we will write Eq. (JHJ) in the compact form 

P(y|x) = (TOy|p) , (11) 

Once a physical theory has been specified, the goal of the players is to find 
the best state p and the best measurements that maximize the expected payoff 
uj, given by Eq. ©. For a given theory T, we denote by ujf the maximum payoff 
that can be obtained by optimizing over all possible states and measurements 
allowed in T. 


5. Causality, no-signalling, and conditional tests 

In general, the probability distribution p(y|x) in Eq. fl5J) can allow for sig¬ 
nalling. In the framework of operational-probabilistic theories, No-Signalling is 
imposed by the Causality principle, stating that the probability of an outcome 
at a given step in a circuit is independent of the choice of tests performed at 
later steps. Precisely, the principle can be stated as follows: 

Definition 1 (Causality [37j, ;2(|). A theory satisfies causality iff for every 
system S, every preparation-test p = {p x }xex for system S, and every two 
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measurements m° = {m?.„ ) and m 1 = { m} H } .. on system S the con- 

l VoJy 0 £Y 0 L « 1J yieYi a 

ditional probability distributions p(x,y~\z ) := (m* | p x ) satisfy the condition 

^2 p(x,y 0 \0) = ^2 p(x,yi\l) VzeX. (12) 

y o6Y 0 2/i 6Yi 


Informally, Eq. (fl2|) expresses a condition of No-Signalling from the future: 
the (marginal) probability of a preparation does not depend on the choice of 
measurement. 

Causality is equivalent to the requirement that for every system S there 
exists a unique effect us, called the unit effect such that 

J2m y = u s (13) 

2/£Y 

for every measurement {m y } ye y on S. When there is no ambiguity, we will 
drop the subscript from ug . In quantum theory, ug is the identity operator on 
the Hilbert space of the system and Eq. m expresses the fact that quantum 
measurements are resolutions of the identity [cf. Eq. (fo|)]. 

Causality is equivalent to the statement that for every system S there exists 
a unique deterministic effect us £ Effi(S) [33] ■ In categorical terms, this con¬ 
dition is the terminality of the tensor unit (the trivial system, in our language) 
and defines a special class of categories called causal categories [H^, [z3] ■ 


5.1. Causality and No-Signalling 

Causality implies that the probability distributions p(y|x) generated by local 
measurements as in Eq. © satisfy the no-signalling condition (cf. theorem 1 of 
Ref. 37] and theorem 5.1 of Ref. [70|). In fact, under a minimalistic assumption, 
Causality is equivalent to the request that all the probability distributions of 
the form of Eq. ([9]) are no-signalling. The assumption is that every ensemble of 
states can be generated by performing a measurement on one side of a bipartite 
state: 


Assumption 1 (cf. Axiom 2 of (4x|). For every system A and for every en- 
300 semble {p x }-. rex, describing a random preparation of A, there exists a system B, 
a deterministic state o £ Sti(A ® B) and a measurement {b x } x ^\ such that 



VxeX. 


(14) 


6 We adopt this terminology to facilitate the comparison of our framework with the convex 
set framework [24l[34 ll38ll , where the existence and uniqueness of the unit effect—and therefore 
the validity of Causality—is built in. 


12 









This assumption is so natural that could even be included in the definition 
of OPT: indeed, one can think of system B in Eq. (fl4l) as the physical support 
that carries the classical information about the outcome x— information which 
is read-out by performing the measurement If Eq. (1141) were not to 

hold, we could not represent the outcome x as information carried by an actual 
physical system. Note that this observation applies not only to ensembles of 
states, but also to generic tests with non-trivial input and non-trivial output. 

Under the validity of Assumption [Tj Causality and No-Signalling are equiv¬ 
alent: 


Proposition 1. For every theory satisfying Assumption © the following condi¬ 
tions are equivalent 

1 . the theory is causal 

2. every input-out probability distribution p(y|x) generated as in Eq. (©) is 
no-signalling. 


The proof is rather elementary and is provided in Appendix A As a conse¬ 
quence, maximizing the payoff of a nonlocal game over all possible theories that 
satisfy Causality is equivalent to maximizing the payoff to in Eq. © over all pos¬ 
sible conditional distributions that satisfy No-Signalling. The relation between 
Causality and No-Signalling has recently played an important role in the study 
of network scenarios inspired by Pearl’s notion of causal networks 71, 72j and 
of the entropic relations implied by causal networks in operational-probabilistic 
theories ful. 


5.2. Causality and conditional tests 

Thanks to Causality, one can use the information gained in the past to 
decide which tests are performed in the future, thus implementing conditional 
tests. Conditional tests are defined as follows: If T = {T x } x ex is a test with 
input A and output B and, for every x, S x = {S x } yeY a f es f w ith input B 

and output C for every x, then the conditional test {SfT x } a . eX ye Y * s ^ le t est 
that results from performing the test 7~ and, conditionally on outcome x , the 
test S x , as in the the diagram 




C 


Causality guarantees that the collection of transformations {S x T x } X £X,yey can 
be included among the tests allowed by the theory without generating contra¬ 
dictions [37| . Since they can be included, one may as well assume that they are 
included, which amounts to the following 


Assumption 2. For every test T = {7 ^} x gx of type A — » B and for every set 
of tests S x = {S x } yeY > X, of type B —>• C, the collection of transformations 

{SyTx} xeX yeY is a test of type A->C. 
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Quite importantly, Assumption [2] implies Causality (cf. lemma 7 of Ref. 
37|): only in a causal world the agent can freely choose future tests depending 


on the outcomes of previous ones. From now on, we will assume Assumption [5] 
and convexity as part of the Causality package coming: by Causal theory, we 
will mean a theory satisfying Assumption O 


6. Spiky measurements 

In this section we define a privileged class of measurements, which we call 
spiky measurements. In Quantum Theory, spiky measurements coincide with 
projective measurements, i.e. measurements consisting of projectors on a com¬ 
plete set of orthogonal subspaces. 

6.1. Purity 

A pure transformation V is a transformation that cannot be obtained from 
the coarse-graining of two different transformations P\ and V 2 '■ precisely, the 
transformation P is pure iff one has 

V = V t + V-2 Vi = pp, P-2 = (l- p)P, p G [0,1] . 

Intuitively, the pure transformations are those for which the evolution of the 
system is known with the maximal accuracy allowed by the theory. In quantum 
theory, the pure transformations are those with a single Kraus operator, i.e. 
those of the form P(p) = MpM', for some operator M satisfying M'M < Is, 
Is being the identity on the system’s Hilbert space. 

As a particular case of pure transformations, one can consider pure states 
and pure effects. A pure state is just a pure transformation with trivial input. 
A pure effect is a pure transformation with trivial output. In quantum theory, 
pure states and pure effects are proportional to rank-one projectors. Using the 
notion of pure effect, it is natural to define pure measurements: 

Definition 2. We say that a measurement m is pure iff it consists of pure 
effects. 

Intuitively, a pure measurement extracts information in a way that cannot 
be further refined. For example, for a three-level quantum system, the compu¬ 
tational basis measurement {10)(01, |1)(1|, 12)(21} is pure, while the two-outconre 
projective measurement {|0)(0|, |1}(1| + |2)(2|} is not pure, since it can be ob¬ 
tained from the former by coarse-graining. 

6.2. Orthogonality 

In addition to purity, another desirable feature of measurements is orthogo¬ 
nality. We say that a measurement is orthogonal if it can perfectly distinguish 
among the states in a given set: 
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Definition 3. A measurement on system S, say m = {m y } y ^Y, is orthogonal 
iff there exists a set of states, say {p y } y eY, such that 

( m y\Py') = <W V' £ Y • ( 15 ) 

This notion of orthogonality can be easily extended to sets of effects that do 
not necessarily form a measurement: 

Definition 4 (Orthogonality of states and effects). A set of effects {m y } y ^Y C 
Eff(5) and a set of states {p y } ye y C St (S) are biorthogonal iff 

i m y\Py') = 8y,y' 

for every y,y' £ Y. A set of effects {m y } is orthogonal iff there exists a set 
of states {p y } such that the two sets are biorthogonal. A set of states {p y } 
is orthogonal, iff there exists a set of effects {m y } such that the two sets are 
biorthogonal. 

The familiar example of Quantum Theory should not mislead the reader. In 
this paper we do not define orthogonal states as states that can be perfectly 
distinguished by a measurement. Distinguishability implies orthogonality, but 
in general the converse does not hold: if the states {p y } y& y are orthogonal, this 
only means that there exist effects {m y } yG y such that (TOj,|/v) = S ViV ', but 
in general the effects {m y } y ^Y may not form a measurement Q Nevertheless, 
orthogonality and distinguishability are equivalent notions for pairs of states: 

Proposition 2. Two states po and pi are orthogonal if and only if they are 
perfectly distinguishable. 

The proof is elementary and is provided in |Appcndix B| The above propo¬ 
sition shows that orthogonality for pairs of states is a very special notion. 

Note that pairwise orthogonality does not imply orthogonality: The condi¬ 
tion that two states p y and p y ' are orthogonal for every y,y' £ Y, y ^ y' is 
not enough to guarantee that the states {p y } y eY are orthogonal. The canonical 
counterexample is the square bit Izllzil, discussed in the following: 

Example 1 (The square bit). Consider a physical system whose deterministic 
states form a square. Suppose that the measurements are represented as positive 
affine functionals summing up to the functional that gives 1 on every point of 
the squared |. 


7 Recall that the set of measurements is part of the specification of the theory. 

8 For the purpose of this example, we only need to declare the states and the effects of 
the system. We omit the specification of the full OPT in which the square bit lives. As a 
matter of fact, there are many different OPTs that contain “square bits” among their systems. 
For example, consider an OPT where the systems are composite systems of square bits, the 
states are convex combinations of product states, the measurements are those that can be 
implemented by (coarse-grainings of) measurements on individual square bits, and the general 
tests are those of the “measure-and-prepare” form, i.e. those consisting on measuring the input 
system and preparing an output state depending on the outcome of the measurement. 
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The square bit has four pure states and four pure effects, given by the vectors 


I 



1 


respectively. The probabilities are given by the scalar product of vectors, yielding 


(ciyl^Py) — (a y Qi\ip y ) — 1 
(Oy©2 \Ty ) ( a y©3|+y) 0 Vy = 1, 2,3,4 


(16) 


where © denotes the addition modulo f. Here there are two pure measurements, 
namely {01,03} and {02,04}. Indeed, it is easy to check that 01 + 03 = 02 + 04 = 
u, where u is the deterministic effect 



giving probability 1 on every pure state. It is not hard to see that the four 
pure states {<p y }y = i are pairwise orthogonal, but not orthogonal (ad, therefore, 
not perfectly distinguishable). Similarly, the four effects {a y }y =1 are pairwise 
orthogonal, but not orthogonal. 

Finally, note that two orthogonal effects, as defined in Definition [2 may not 
coexist in a measurement. An easy counterexample can be found in quantum 
theory. Consider the two projectors P\ = |0)(0| + |1)(1| and Pi = |0)(0| + |2)(2|. 
The two projectors correspond to orthogonal effects in the sense of Definition SJ 
indeed, there exist two states p\ = |1)(1| and p 2 = |2)(2| such that Tr[P* pf\ = 
Sij, for every i and j in {1,2}. However, Pi and Pi cannot coexist in the 
same measurement, because for p 0 = | 0 )( 0 | one has Tr[Pip 0 ] + TrfPapo] = 2 , in 
400 contradiction with the normalization of probabilities. 

6 . 3 . Purity plus orthogonality 

We are now ready to define the notion of pure orthogonal measurement: 

Definition 5 . A pure and orthogonal measurement is an orthogonal measure¬ 
ment consisting of pure effects. 

In Quantum Theory, the pure orthogonal measurements are the measure¬ 
ments consisting on rank-one projectors on the vectors of an orthonormal basis. 
Pure orthogonal measurements featured in a recent work |76), where the authors 
explored different inequivalent notions of dimension of a physical system. In this 
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work, the (maximum) number of outcomes in a pure orthogonal measurement 
was called the measurement dimension of the system. 

To appreciate the meaning of Definition [5] outside the quantum context, it 
is worth having a look at the square bit of Example [I] Here, each of two pure 
measurements { 01 , 03 } and { 02 , 04 } is orthogonal: for example, { 01 , 03 } allows 
one to distinguish perfectly between the two states pi and ps defined as 


pi =p<pi + (1 —p) <P4 P3 = q^3 + (l-q)if2, 


where p and q are arbitrary probabilities. Note that here a pure effect can give 
probability 1 on a mixed state. In this respect, the square bit differs radically 
from the quantum bit, where a pure effect can give probability 1 on one and only 
one pure state. The one-to-one correspondence between pure states and effects is 
a non-trivial property, which played an important role in several reconstructions 
of Quantum Theory 77, 2l|, 2*1 23], 22], 28], 27 and will also play a role in the 
present paper. 


6 . 4 ■ Spiky measurements 

We are now ready to define the set of spiky measurements: 

Definition 6 (Spiky measurement). A measurement m = {m y }y^y is spiky [HI 
iff it is the coarse-graining of a pure orthogonal measurement, i.e. iff there exists 
a pure orthogonal measurement a = {a z } ze z and a partition of the outcome set 
Z into disjoint subsets {Zy} ygY such that 

(m y | = (°bl- 

ZGZy 

The above definition of “spiky” measurements is equivalent to the definition 
of “sharp” measurements by Barnum, Miiller, and Ududec j78|. In this paper 
we prefer to avoid the term “sharp”, because we would like to reserve it for 
measurements that are repeatable and minimally disturbing [lj, this being a 
property traditionally associated to sharp measurements in quantum theory 
[79|, 80]. Admittedly, the choice of terminology is mostly a matter of taste 
here, since in quantum theory the two definitions coincide and single out set of 
projective quantum measurements. 


7. Axioms 

Here we present three requirements about spiky measurements. These three 
requirements are satisfied by both classical and quantum theory and imply the 
validity of Causality and Local Orthogonality. 


9 Here the “spikes” are the pure orthogonal effects a z , z S Z. 
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7.1. Measurement Purification 

Measurement Purification is the statement that every measurement can be 
reduced to a spiky measurement performed jointly on the system and on an 
environment: 


Axiom 1 (Measurement Purification). For every system S and for every mea¬ 
surement on system S—say m = {m y } y ^y—there is another system E, a state 
a £ St (E), and a spiky measurement M = {M y } ye y such that 


s 

JTlq, 





\/y£ Y. 


Roughly speaking, one can think of the above axiom as an operational ver¬ 
sion of Naimark’s theorem for finite dimensional quantum systems [gj, [82) . 
which states that every quantum measurement can be dilated to a p roj ective 
measurement performed jointly on the system and on an environment Hj. 

The idea that arbitrary measurements can be reduced to ideal measurements 
by introducing an environment immediately reminds of the Purification Princi¬ 
ple [ail il Hail 113 , which states that arbitrary states can be reduced to pure 
states by adding an environment. In this sense, the ^spirit of this paper is akin to 
the “purification philosophy” of Refs. [.37, UcJ HI, Ej, [lo] namely the idea that 
all physical processes can be reduced to ideal processes by including additional 
systems into the description. The interaction with an environment is a power¬ 
ful structure also in the abstract framework of categorical quantum mechanics 
85, 86}, where it leads to an axiomatization of Selinger’s CPM construction 87]. 


We now give an elementary consequence of measurement purification that 
will be useful later: 


Lemma 1. Let {m 1 ^^ be a finite set of measurements labelled by a parameter 
x £ X. If measurement purification holds, then there exists a system E, a state 
a £ St (E), and set of spiky measurements such that 





Vy£ YVx£X. 


The proof is provided in |Appendix C[ Compared to the Measurement Pu¬ 
rification axiom, the above lemma only adds the fact that the system E and 
the state a £ St (E) can be chosen to be independent of the setting x £ X, the 
dependence on the setting being only in the orthogonal measurement M 1 . 


7.2. Locality of Pure Orthogonal Measurements 

Measurement Purification can be interpreted as the statement that, at the 
most fundamental level, measurements are generated by the coarse-graining of 


"■'Note that Naimark’s theorem also includes the fact that the state of the environment is 
pure and that the dilation is unique, up to partial isometries. These two additional facts are 
also important, but not for the purpose of the present paper. 
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pure orthogonal measurements. If one pushes this requirement further, it is 
natural to ask that the product of two pure orthogonal measurements is a pure 
orthogonal measurement on the composite system: 

Axiom 2 (Locality of Pure Orthogonal Measurements). If m = {m x } and n = 
{n y } are pure orthogonal measurements on two systems A and B, respectively, 
then their product m®n = {m x <S>n y } is a pure orthogonal measurement on the 
composite system A® B. 


Two comments are in order: 


1. Locality of Pure Orthogonal Measurement may superficially look as a con¬ 
sequence of the definition. And in part it is: clearly, the product of two 
orthogonal measurements is orthogonal. The part that does not follow 
from the definition is the purity of the product effects. This condition 
would be guaranteed by Local Tomography [ 24 |, 0 , 34 1 , which is not as¬ 
sumed here. The Locality of Pure Orthogonal Measurements is much 
weaker condition than Local Tomography: for example, it is satisfied by 
Quantum Theory on real Hilbert spaces [88j, l89|, 190( , a well-known example 
of theory wherein Local Tomography fails to hold. 

2. If one postulates Measurement Purification, then it is natural to assume 
the Locality of Pure Orthogonal Measurements as well. Indeed, suppose 
that two distant parties, Alice and Bob, perform two pure orthogonal 
measurements m and n on their systems. By Measurement Purification, 
we know that the product measurement m <g> n can be reduced to a pure 
orthogonal measurement—call it M —performed jointly on A, B, and an 
environment E. If the measurement M could not be chosen of the product 
form, it would mean that the measurements that are performed indepen¬ 
dently by Alice and Bob would require some nonlocal interaction at the 
fundamental level. 


7.3. Sufficient Orthogonality 

Here we introduce Sufficient Orthogonality (SO), a structural property of the 
measurements allowed by the theory. We do not attach a particular operational 
meaning to this property, e.g. we do not argue that this should be a fundamental 
principle of physics. Nevertheless, we show that SO plays a key role, allowing 
one to derive LO and CE. We view SO as an intermediate step, which can be 
used to reduce LO and CE to other, more fundamental features of physical 
processes, such as the Strong No Disturbance Without Information principle 
discussed in Section flOl 

Axiom 3 (Sufficient Orthogonality). Every set of pure orthogonal effects can 
coexist in a measurement, i.e. for every set of pure orthogonal effects {a y } y& y 
there exists a measurement m such that {a y } ye y C m. 

In the statement of SO it is essential that the effects {p y } are pure, other¬ 
wise one can find counterexamples even in classical and quantum theory. For 
example, the non-pure effects Pi = |0)(0| + 11)(11 and P 2 = |0)(0| + |2)(2| are 
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500 orthogonal, because Tr[Pjpj] = Sij for i,j = 1,2 and pj = \j){j\- However, 
they cannot coexist in a quantum measurement, since Pi + P 2 > I- 

SO is satisfied by both classical and quantum probability theory, where a 
set of pure orthogonal effects is a set of rank-one projectors {P y } y& y satisfying 
Tr[P y P y /] = S VtV '. Interestingly, SO is violated by the square bit of Example 
|T| More generally, SO is violated by all systems whose state space is a regular 
polygon of n > 3 vertices: 

Example 2 (Regular polygons [zH). Consider an hypothetical physical system 
S n whose deterministic states Sti(S n ) form a regular polygon ofn vertices. The 
vertices of the polygon are the pure states and can he represented by the real 
vectors 


( r n cos ^ \ 


K) = 


r„ sin 


2 ny 


1 


cos (77/71) 


V 1 


for y = 0,1,...,ti — 1. Effects are also represented as real vectors, and the 
probability of an effect on a state is given by the scalar product. The unit effect, 
which has probability 1 on every pure state, is represented as 


W = 



For the measurements, one typically assumes the no-restriction hypothesis (cf. 
Ref fSfhji definition 16, and Ref. }9$]. section III), which states that all col¬ 
lections of positive affine functionals summing up to the unit represent allowed 
measurements 0. Under this hypothesis, one has the pure effects 


( 


r n cos 


(2y-l)n 

n 


\ 



r n sin 


(2y-l)ir 

n 


V 1 / 


11 We do not specify here the full OPT in which the regular polygon is included. In general, 
it is easy to include a given system S, its state space, and its set of allowed measurements 
into a full-blown OPT. For example, one can consider the OPT where all systems consists 
of multiple copies of S, the states are product states (or convex combination thereof), the 
measurements are product measurements (or convex combinations thereof), and the general 
tests are of the measure-and-prepare form. Unless one imposes additional physical constraints, 
the specification of the OPT in which a given system can be embedded is highly non unique. 
Still, such a specification is irrelevant for the scopes of the present example. 
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for even n and 


Kl =: 


/ r n cos^f \ 


r n sin : 


V 1 


for odd n. Note that for every y one has ( a y \ip y ) = 1. 

For n = 3, the effects {a y }^ =1 can coexist in a measurement, and, therefore, 
all the three pure states are perfectly distinguishable. This is not surprising, be¬ 
cause the triangle is a simplex, simplices represent the states of classical systems, 
and classical systems satisfy SO. 

We now show that the triangle is the only regular polygon satisfying SO. Let 
us start from the case of even n. Here, the inner product between a pure effect 
and a pure state is given by 

+ 1 

and it is immediate to check that one has 


KK') = o \ r n cos 


(2 y - 2 y ' - 1)tt 


n 


KK) = K©iK) = 1 

K©f \Vv) = Kef ©il Ty) =0 Vy € { 0 , 1 ,..., n - 1 } 

where ® denotes the addition modulo n. Clearly, the pure effects {aj,aj 0 ^©i} 
are orthogonal, since they are biorthogonal to the pure states {<Pj, } ■ How¬ 

ever, they cannot coexist in a measurement: by absurd, if they coexisted in a 
measurement, the total probability of the measurement outcomes on the state 
K©:|+i) would exceed one: 


KK©f©i) + ( a j©f ©iK©f ©i) — KM©f©i) + 1 > 1 • 


Hence, every polygon with even number of vertices violates SO. 

For odd number of vertices, the inner products of a pure effect with a pure 
state is 


and one has 


( a y\ Ty') — 


rl cos 


(2 y - 2y')7r 


n 


(17) 


Kl <p y ) = 1 

K©K I Vv) = KffiK ^v) = 0 Vy e {0,1,..., n - 1} . 


Clearly, the two effects |a y , a ye n+i | are orthogonal, as they are biorthogonal 

to the states ^p y , j ■ However, they cannot coexist in a measurement, 

because the sum of their probabilities on the state Kqi) exceeds one, as shown 
in | Appendix D In summary, the only regular polygon compatible with SO is 
the triangle, representing a three-level classical system. 
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8 . Local Orthogonality 

Here we briefly discuss LO, a requirement on the conditional probability 
distributions p{ y|x) generated by N players of a non-local game. To state the 
requirement, it is handy to introduce a notation for the output/input pairs 
e = (x, y) , which will be called events. 

Definition 7 (Locally orthogonal events). Two events e = (x, y) and e' = 
(x',y') are locally orthogonal, denoted as e J_ e', if there exists at least one 
party i such that Xi = x\ and y\. A set of events 0 is locally orthogonal if 
every two elements in 0 are locally orthogonal. 


For an event e = (x, y), we use the notation p(e) := p(y|x). With this 
notation, LO is defined as follows 

Definition 8 (Local Orthogonality [n], 03)- A conditional probability distri¬ 
bution p(y|x) satisfies Local Orthogonality iff one has 

P ( e ) ^ 1 ( 18 ) 

eGO 

for every locally orthogonal set 0 . A theory satisfies LO iff every probability 
distribution generated as in Eq. m satisfies LO. 

In a bipartite setting, LO is equivalent to No-Signalling [Util. LO comes 
to its own in the multipartite setting, where Eq. m is more restrictive than 
the No-Signalling condition. In a theory satisfying LO, the maximum payoff 
achievable by the players of a generic game is upper bounded as 


U )T < Wio , 


(19) 


where ujlo denotes the maximum of the payoff to in Eq. m over all probability 
distributions p(y|x) satisfying LO. 

Note that LO has a slightly different flavour from other device-independent 
principles. Indeed, principles like Nontrivial Communication Complexity, No 
Advantage in Nonlocal Computation and Information Causality are expressed 
as limitations about some distinguished information-theoretic task. Such limi¬ 
tations are subsequently used to derive upper bounds on the payoffs of nonlocal 
games. Instead, LO is defined as an upper bound on a payoff, as one can see 
by comparing the l.h.s. of Eq. (fT51) with the r.h.s. of Eq. ©. The particular 
games that define the LO constraint have been characterized in Ref. 17] and 


have been therein named maximally difficult guessing games. In this sense, Eq. 
dD represents the upper bound on the payoff of a generic game under the con¬ 
dition that the payoff in some privileged class of games is upper bounded as in 
Eq. (1151) . 


LO can be generalized to an infinite hierarchy of constraints [11, 17j. This is 
done as follows: Suppose that the N parties are given k copies of the black box 
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generating outputs according to the conditional probability distribution p(y|x). 
As a result, the overall input-output distribution will be given by 


P 0fe (yiY2 ■ • .yfc|xix 2 .. .x fc ) =p(yi|xi)p(y 2 |x 2 ) •• •p(yfclxfc). 


Defining the event = (yiy 2 ... yfc|xix 2 ... x^) and its probability p® k (ek) := 
p® k ( yiy 2 ■ • ■ yfe|xix 2 ... x*,), one can formulate the fc-th level of the LO hierar¬ 
chy as 

Definition 9 (Local Orthogonality at the fc-th level [ll], Qo- A conditional 
probability distribution p(y|x) satisfies LO at the k-th level iff 

£ < 1 ( 20 ) 

for every locally orthogonal set S*,. A theory satisfies LO at the k-the level iff 
every probability distribution generated as in Eq. m satisfies LO at the k-th 
level. 


By increasing k, one gets more and more restrictive conditions on the prob¬ 
ability distribution p(y|x). For example, PR box correlations satisfy LO for 


k = 1 , but violate it for k > 2 [ll| . 


9 . Deriving Local Orthogonality and Causality 

We now provide a derivation of LO from Measurement Purification, Local¬ 
ity of Pure Orthogonal Measurements, and Sufficient Orthogonality. Since LO 
implies No-Signalling 0,113 and in our framework No-Signalling is equivalent 
to Causality (proposition [T] of this paper), our derivation of LO also amounts 
to a derivation of Causality. The derivation consists of a few steps, discussed in 
the following paragraphs. 

9.1. Local Orthogonality for pure orthogonal measurements 

We start by showing the validity of LO for probability distributions generated 
by pure orthogonal measurements: 

Lemma 2 . Let p( y|x) = (niy\pj be a set of probability distributions defined 
as in Eq. ([Ill with the product effect m* arising from a set of pure orthogonal 
measurements. If the theory satisfies Locality of Pure Orthogonal Measurements 
and Sufficient Orthogonality, then it p(y|x) satisfies Local Orthogonality at all 
levels of the hierarchy. 

Proof. Firstly, we consider the proof for the first level. For every locally or¬ 
thogonal set of events 0 , we have to prove the relation ]T (x y)eoP(yl x ) — 1 - 
The proof runs as follows: First, consider an arbitrary party i and a fixed (but 
otherwise arbitrary) input value Xj. By hypothesis, the effects { m yi } y . eY . are 
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orthogonal, which means that there exists a set of states {py\} y gY c St (Sf) 
such that 


("«k0 


Vt/i, y' G Yi 

Now, define the product states 



K) 

8 

jl 

••1 PyZ) 

and the product effects 



Kl 

~K:I- 

•■(Cl ■ 


By the Locality of Pure Orthogonal Measurements, the effects m y are pure. 
With this definition, if two events (x, y) and (x^y') are locally orthogonal, 
then one has 

( m y\py) = 

By definition, this means that the pure effects { m y}( xy ) £ o are orthogonal. 
Invoking Sufficient Orthogonality, we have that there exists a measurement m 
such that {'TOy } (x C m. Using this fact we obtain 

Y p(y l x ) = Y ( TO yl p) 

(x.y)GO (x.y)eo 

< Y^ i m e\p) 
ee5 
= 1 , 

where 0 denotes the set of all outcomes of the measurement m. The above 
inequality concludes the proof of LO in the case when each party performs a 
pure orthogonal measurement on one subsystem of a composite system. The 
argument can be easily extended to prove LO at every level: in this case, one 
has simply to replace x and y with the strings x-[x 2 ... x/ c and yi Y 2 ■ ■ - Yk, 
respectively. □ 

9.2. Local Orthogonality for generic measurements 

Having derived LO for pure orthogonal measurements, it is easy to extend the 
derivation to arbitrary measurements. The strategy is to extend the proof first 
to spiky measurements (by coarse-graining) and then to arbitrary measurements 
(by measurement purification). The first step is achieved by the following 

Lemma 3. Letp( z|x) be a conditional probability distribution of the variable z G 
HhZi conditional to the variable x G TlfliXi. Let p( y|x) be the probability 
distribution resulting from local coarse-grainings of p( z|x), that is, 

N 

P(y|x) = Y P( z l x ) VyGj^Yi, 

zenf =1 z Bi *=i 
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where, for each i, {Z yi } yi eYi is a partition of Z, into disjoint subsets. If the 
distributions p( z|x) satisfy LO, then also coarse-grained distributions p(y|x) 
satisfy LO. 


We omit the proof of the lemma, which can be found in Section V of Ref. 

1 - 

An immediate corollary is the following: 


Corollary 1. Let p( y|x) = (niy\p) be a set of probability distributions defined 
as in Eq. ns with a set of spiky measurements. If the theory satisfies Locality 
of Pure Orthogonal Measurements and Sufficient Orthogonality, then p(y|x) 
satisfies LO at all levels of the hierarchy. 


Combining this observation with Measurement Purification, one can prove 
the desired result: 


Theorem 1. Every theory that satisfies Axioms |7J 0 and&must satisfy LO at 
every level of the hierarchy. 

Proof. Let us start from the first level of the hierarchy. Let p(y|x) be an ar¬ 
bitrary probability distribution arising from local measurements m Xi as in Eq. 
ED- For every party i , use lemma |T| to represent the measurement m Xi as 

{ m yl | = {My* I \£ S ® \ a i)\ 

for some spiky measurement M Xi on Si <g> Ei and for some state cq € St (Ei). 
Now, by construction the conditional probability distribution is equal to 

P( y|x) = (M$\a) 

where 


(M y *|:= (M x f\...(M x ff\ 

W) ~ Ip)M---|oaO 

(with a little abuse of notation, consisting in the fact that the systems are 
ordered as S 1 E 1 S 2 E 2 ■ ■ ■ SnEn in the expression of the effect M* and as 
600 S\S 2 ■ ■ ■ SnEiE 2 , ■ ■ ■ Em in the expression of the state a). By Corollary Q] we 
conclude that p(y|x) satisfies LO at every level of the hierarchy. 

□ 


we have just shown that every prob- 


9.3. Deriving Causality 

Since LO implies No-Signalling 
ability distribution generated by measurements in a theory satisfying Axioms 
1-3 satisfies No-Signalling. Under the minimalistic Assumption [l] Proposition 
|T] tells us that the theory must satisfy Causality: 


Corollary 2. If a theory satisfies Axioms 1-3 and Assumption QJ then the theory 
satisfies Causality. 
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The fact that Causality follows from the axioms, rather than being assumed 
from the outset is a pretty remarkable fact. Up to now, the only axiomatization 
of quantum theory that does not assume Causality from the outset is Hardy’s 
2011 axiomatization 21 . There, Causality is derived from an axiom called 


Sharpness, stating that for every pure state there exists a unique effect that 
gives probability one on that state and only on that state 0 
It is worth stressing that, as per toda’ 
ity explicitly as an axiom 


a few works acknowledge Causal- 
while most works assume 

Causality implicitly as part of the frameworlJ 13 l. see e. g. [34j, [38|, [22j, [0, 411. 



Recognizing Causality as an axiom is a good starting point to explore devia¬ 
tions from it, thus developing an operational approach to quantum gravity and 


indefinite causal structure |93j, ]94|, 137], [95j, [96|, |97|. 


10. Deriving Sufficient Orthogonality 

In the previous section we showed that Local Orthogonality and Causality 
can be obtained from three requirements on the structure of measurements. 
While the first two requirements (Measurement Purification and Locality of Pure 
Orthogonal Measurements) are physically well motivated, the third (Sufficient 
Orthogonality) sounds rather ad hoc. Can one reduce it to some other, better 
motivated axiom? In this section we give a possible answer, which however, 
requires us to assume Causality. 


10.1. No Disturbance Without Information 

Informally, the No Disturbance Without Information (NDWI) principle states 
that if a measurement extracts no information about a source, then the measure¬ 
ment can be implemented without disturbing the states in that source. NDWI 
appeared originally in the axiomatization work of Ref. 2fl|, where it was ob¬ 
tained as a consequence of the axioms (cf. Corollary 10 of (20(). Recently, NDWI 
has been promoted to the rank of an axiom by Pfister and Wehner [41], who 
showed that every discrete theory satisfying this requirement must be classical. 

In order to give the precise statement of NDWI, it is useful to give some 
definitions. Here by source we mean a deterministic state p, considered as the 
average state of an ensemble of signal states. A state in the source p is a state 
that can be contained in a convex decomposition of p: 


Definition 10. Let p and r be two deterministic states of system S. We say 
that t is in the source p iff there exists a nonzero probability p > 0 and a state 
t' € Sti(5') such that p = pr + (1 — p)r'. 


With this definition, a non-informative measurement is one that gives the 
same statistics for all possible states in the source: 


12 Note that the Sharpness axiom by Hardy is slightly different from the Sharpness axiom 
used by Wilce in Ref. 12811 

13 For example, Causality enters in the convex set framework [3S| in the moment when 
measurements are defined as decompositions of the order unit. 
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Definition 11 (Non-informative measurements). Let m be a measurement on 
system S, with outcomes in the set Y. We say that the measurement m does 
not extract information about the source p £ Sti(S) iff there exists a set of 
probabilities {p y } y eY such that 

{m v \T)=p y Vy£ Y 


for every state r in the source. 


In other words, a measurement extracts no information about the state p 
iff the probability of the outcome y is the same for every state in a convex 
decomposition of p. In Ref. 41] Pfister and Wehner consider the special case of 
non-informative measurements where the measurement gives an outcome with 
certainty, i.e. p Vo = 1 for a particular outcome y$. 

Let us specify what it means to realize a measurement without disturbing 
the states in a source: 


Definition 12 (Realization of a measurement). Let T = {T v }^y be a test of 
type A —> B and let u £ Effi(R) be a deterministic effect on system B. The 
pair ( T,u ) is a realization of the measurement m = {m y } y& y iff one has 

( m y \ = (u\T v VyeY. (22) 

Definition 13 (Non-disturbing test). A test'T of type S —»• S is non-disturbing 
for the source p iff one has 


J2 r v i r )= l r )’ 

v&y 


for every state r in the source. 

Definition 14 (Non-disturbing realization). A measurement m on system S 
admits a non-disturbing realization for the source p iff there exists a realization 
of m, call it (fT,u), such that T is non-disturbing for the source p. 

Using the above definitions, we can give the precise statement of NDWI: 

Definition 15. A theory satisfies No Disturbance Without Information (NDWI) 
iff every measurement m that does not extract information about the source p 
has a realization {T,u) that is non-disturbing for this source. 

10.2. From Causality and NDWI to the joint distinguishability of orthogonal 
states 

Here we show that Causality and NDWI imply that orthogonal states can 
be perfectly distinguished. Although this fact may sound obvious (it is trivially 
true in Quantum Theory), its validity is far from obvious in a general physical 
theory. 


27 



Theorem 2 (Orthogonal states are perfectly distinguishable). In a convex the- 
orv [^1 satisfying Causality and NDWI orthogonal states are perfectly distinguish¬ 
able. 

Proof. Let {p y } y e y be a set of orthogonal states. By definition [I] there exists 
a set of effects {m v } V £Y such that ( m y \p y ') = S y y for every y,y' G Y. As 
a consequence, the measurement m ^ niy := u — m y does not 

extract information about the source 

_L_ 1 

Py ' | y |_ 1 ' 

y'^y 

Indeed, the relations (m y \py) = 0 and (rriy\py) = 1 imply analog relations 

( m y\ T ) = 0 

(™ y \T) = 1 ( 23 ) 

for every state r in the source p y . 

By the NDWI axiom, rrd^ has a non-disturbing realization, given by two 
transformations {T y ,T^} such that 

{u\T y = m y 

= niy ( 24 ) 

and (T y + T y L ) |t) = |t) for every state r in the source p y . In particular, we 
have 


{Ty+T y ± )\Py') = \Py’) ty ? V . (25) 

Applying the unit effect on both sides of Eq. (E5l) and using Eqs. EH) and (E5 i 
one obtains {u\T y \p y ') = 0. By Causality, this relation implies T v \p y ') = O^f. 
Hence, Eq. (E5l) becomes 

'T~y\p y ') = 0 (26) 

Ty ± \Py') = \Py') ty G Y, y' ± y (27) 

Note also that by construction we have (u\T y L \p y ) = (my \p y ) = 0, which implies 

Ty ± \Py)=0. (28) 


14 A “convex theory” is defined as a theory where all the set of states, effects, and transfor¬ 
mations are convex. So far, we never assumed convexity, and indeed such assumption is not 
part of the basic framework of OPTs. Even in the present theorem, we will use convexity only 
in a minor way, just to guarantee that we can mix with non-zero probabilities the states in a 
given set. 

15 Causality implies that for every effect a E Eff(S'), the two effects {a, u — a} form a 
legitimate measurement. Hence, the condition (u\p) = 0 implies (a|p) = 0 for every ef¬ 
fect, i. e. p = 0. 
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In addition, we can also assume without loss of generality 


Ty\Py) = \Py), (29) 

because we can always replace T y with T y := \o,Mu\T v f^l. 

Summarizing Eqs. ESI) . (1271) . ESI) and (ESI) we have 

'Ty\Pv') = &v,v'\Pv) 

Ty L \Py') = (! -8y,y')\p v >) Vy, y' G Y . (30) 

Now, the test {T y ,T y ±} allows one to discriminate between the state p y and 
all the other states {py'}y'eS,y'^y without introducing any disturbance. Hence, 
one way to distinguish perfectly the states {p y } y e y is to enumerate the elements 
of Y, say Y = (yi,..., yjv) and to apply the tests {T Vn , 7 ^} one after the other. 
The resulting test, denoted by {iS y } ye Y will consist of the transformations 


t> s 

II 

<0 


Sy 2 Ty 2 T yi 


Sy 3 := %)i 


c ■— T / T- l 

‘-’VN-l ■ 'VN-l ' yN-2 ■ 

..T 1 - 

'yi 

c ._ '7"T 'T'-L 

’-’VN ' 'VN-l 7 VN — 2 ■ 

T 1 - 
' yi 

Clearly, Eq. I® implies S ym \p Vn ) = | p Vn ). 

Hence, the states {p y } y e y can 


be perfectly distinguished using the measurement m defined by (m v \ := (u\S y , 

VyeY. 

□ 

Note the difference between the statement theorem [2] and the statement that 
a set of pairwise distinguishable states are jointly distinguishable. As we already 
observed, orthogonality [cf. definition 0] and pairwise distinguishability are 
different notions. In general, it is not clear whether the joint distinguishability 
of pairwise distinguishable states follows from NDWI. 

10.3. Strong No Disturbance Without Information 

We now present a strengthened version of the NDWI axiom, stating that, 
in addition to not disturbing the states in a given source, a non-informative 
measurement does not disturb the pure effects that occur with unit probability 
on those states: 


16 The existence of the transformation 7” 7 is guaranteed by Causality along with Assumption 

[2] Indeed, one can perform the test {Ty, T.f } and, conditionally on outcome y. re-prepare the 
state p y . 
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Definition 16 (Strongly non-disturbing test). The test'T = {T v } y ^y is strongly 
non-disturbing for the source p £ Sti(5) iff 

J2Ty \ T ) = \t) 

y&Y 

for every state r in the source and 

= (°i 

y eY 

for every pure effect a such that (a |r) = 1. 

A strongly non-disturbing realization of a measurement is defined in the 
obvious way, as a realization in terms of a strongly non-disturbing test. Using 
this definition, we can now state the strong version of the NDW1 principle: 

Axiom 3' (StrongNDWI). Every measurement that does not extract informa¬ 
tion about the source p has a strongly non-disturbing realization for this source. 

10.4- Derivation of SO 

It is easy to prove that Causality and StrongNDWI imply Sufficient Orthog¬ 
onality: 

Theorem 3. Every convex theory satisfying Causality and StrongNDWI must 
satisfy SO. 

Proof. Let {a y } y& y be a set of pure orthogonal effects and let {p y } y ^y a set 
of states biortliogonal to it. For these two sets, we follow the construction of 
theorem[2] we consider the measurement m y = {a v , a^}, aff := u — a y and note 
that it is non-informative for the state 

_L _ 1 

Py ~ |YI - 1 Pv '' 

1 1 y'lty 

By the StrongNDWI principle, m y will have a strongly non-disturbing realiza¬ 
tion, given by a binary test {T y ,T^~} such that 

(u\T y = a y (31) 

Wj- = Uy 


and 


K' \T V + (<V \V~ = K' I V ± 2/> 

the last equation coming from the StrongNDWI condition. Now, since a y ' is 
a pure effect, the above equation implies (a y '\T y = p(a y /\ and {a y '\T^- = (1 — 
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p)(a y '\ for some probability p € [0,1]. Now, it is easy to show that p = 0: 
indeed, we have 

p = p{a y .\pyt) 

= (dyl \Ty\Py') 

— ( u \'Ty\Py') 

= ( a y\Py') 

= 0 . 

Hence, we conclude that 

K'|7y L = K'l Vy'^y- (32) 

Like in the proof of Theorem [2] we now enumerate the elements of Y, say 
Y = ( 3/1 ,..., 2 /jv), and apply the tests { T yn , 7)^ } one after the other. As a result, 
we obtain a test {«5j,} with outcomes in the set Y:= YU {rest}, defined by 

Syi := Tyi 
Sy2 := 'T'y 2 Ty 1 

3 V 3 := %3 Ty 2 T yi 


SyN ■ ■ ■ Tyj- 

Srest := Ty^TyX^ ■ ■ ■ Ty± 

To conclude the proof, we consider the measurement m defined by (m y | := 
(u\S y , Vy £ Y: for this measurement we have 

(m y i| = (u\T yi = {a yi \ 

( m y 2 1 = ( M I TjtaTyi = MT yi = (dy 2 \ 

{m V3 \ = iu\Ty 3 T^T^ = {a V3 \T y \T y \ = {a V3 \ 


( m yN \ 

(jYlrest | 


(«l r y N r vi - 1 ■■■ r v\ = KJ'C- 

("I'C'C-i = (“i-Z!( a wi> 

yev 


T 1 - 

' 2/1 


KJ 


where we used Eqs. © and (1321) . Hence, we have proven that an arbitrary 
700 set of pure orthogonal effects {a y } ye y can coexist in a single measurement, as 
required by SO. □ 


11. The device-independent framework for contextuality 

In this section we review the device-independent framework for studying 
contextuality. We also review the principle known as Consistent Exclusivity 
(CE) iiiii. 
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11.1. Contextual games 

Consider a game featuring a referee, who asks a question ifX, and a player, 
who responds with an answer y £ Y x . In general, two different questions may 
have overlapping sets of answers, i. e. one can have Y^, n Y x ' 7 ^ 0 for some x ^ x'. 
At each round of the game, the referee chooses a question x at random with 
probability q(x) and assigns a payoff u(x,y) to the answer y. The goal of the 
player is to maximize her expected payoff, given by 


w = 90) 

X 


^2u]{x,y) p(y\x) , 
. y 


(33) 


where p{y\x) is the conditional probability of producing the output y upon 
receiving the input x. We call a game of the above form a contextual game , by 
analogy with the non-local games discussed before. 

Without further restrictions, the maximization of the payoff is trivial: for 
every given question x, the player only needs to respond deterministically with 
the answer y{x) that maximizes ui(x, y). The problem becomes non-trivial if the 
player is forced to assign to each answer y a probability that does not depend on 
which question—among the questions that have y as possible answer—is asked 
by the referee. Mathematically, this amounts to the response non-contextuality 
condition 0 


p(y\x) = p(y\x') Mx,x' GX, Vj/GYxUY^,. (36) 

A strategy satisfying Eq. (13611 is a strategy where the player partly disregards 
the question x. Partly , because she will still make use of her knowledge of x, 


17 “Non-contextuality” here refers to the fact that the context x that gives rise to an answer 
does not influence the probability of its occurrence. The reader should not confuse the re¬ 
sponse non-contextuality condition of Eq. 1361 with the statement that quantum mechanics 
is “contextual”. The latter is just a shorthand for the statement that some of the conditional 
probability distributions p(y\x) = Tr [Py p], generated by measurements on quantum states, 
cannot be reproduced by a non-contextual ontological model 0 , i. e. cannot be written in 
the form 

p{y\x) = E PWr\(y\x) (34) 

aga 


where A is a random variable with probability distribution p( A), and, for every A G A, r\(y\x) 
is a deterministic, non-contextual response function, that is, a conditional probability distri¬ 
bution of the form 


r\ 



1 

0 


y = f\{x) 
y A f\(x) 


(35) 


for some suitable function f\ : X —>• Y satisfying the conditions 


1. normalization: for every question x E X, there exists only one answer y E Y such that 

y = f\(x) 

2. non-contextuality: for every pair of questions x,x ; EX and every answer y E Y x DYj,/, 
one has f\(x) = f\(x'). 
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by restricting the range of her answers to the set Y x . Note that response non- 
contextuality is different from no-signalling, in that the choice of question x can 
affect the conditional probability distribution of the answer y : albeit in a highly 
constrained way. 

We call a strategy response-non-contextual iff it satisfies the response non- 
contextuality constraint of Eq. (1361) . One way to enforce response non-contextuality 
is by imagining that the game is played a large number of times, allowing the ref¬ 
eree to estimate the conditional probability distribution p(y\x) and to penalize 
deviations from Eq. (IMl) . 


11.2. The graph-theoretic framework 

Contextual games can be conveniently cast in a graph-th eore tic framework 
98 . 17l. which has its roots in the framework test space [jonU Toil . I 99 I 1 (see 

also [Toi, Eol). 

To a given game, one can associate a hypergraph 0 H = (Y, X) as follows: 


1. the vertices are the answers in Y := IJj.gx Y^, 

2. the hyperedges are the questions in X, with the question x € X being 
identified with the subset of its possible answers Y x 

[accordingly, we will write y £ x in place of y £ Y x ]. 


A conditional probability distribution p(y\x) obeying the response non-contextuality 
condition (l36l) can be completely described by the function w : Y —»• [0,1] defined 

by 


w(y) := p(y\x) \/x £ X : y £ x . 


(37) 


The function w is a probability weight on the hypergraph H, in the following 
sense: 


Definition 17. A function w : Y —» [0,1] is a probability weight (or a state 
\l Od . 1 1 oA l ) on the hypergraph H = (Y,X) iff it satisfies the condition 

'Y^w(y) = l VzeX. (38) 

yex 

In terms of the probability weight w, the payoff (1331) can be re-written as 

^ c (^) w (y )> ( 39 ) 

yev 

with c(y) := J2 xe x <l( x )u(x,y)- 

The maximization of the payoff over all possible response-non-contextual 
strategies is then equivalent to the maximization of the payoff over all possible 
probability weights. We denote such maximum by cjrnc- 


18 Test spaces (modulo minor variations) have been also called manuals , spaces, hypergraphs, 
cover spaces, and generalized sample spaces, see [99fl for references to the original articles. 

19 We recall that a hypergraph H = (Y,X) con sists of a collection of vertices Y, along with 
a collection X of subsets of Y, called hyperedges [Io| . 
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11.3. Principles about probability weights: the example of Consistent Exclusivity 
The physical limitations affecting the player’s strategy will result into con¬ 
straints on the probability weight w(y). Consistent Exclusivity is one such 
requirement: it states that the sum of the probabilities associated to a set of 
mutually exclusive vertices of the hypergraph H should be smaller than 1. The 
precise definitions are given as follows: 

Definition 18. Two vertices {y, y'} C Y are exclusive iff there exists an hy¬ 
peredge x such that {y,y'} C x. A subset of outcomes E C Y is called mutually 
exclusive iff every pair of distinct outcomes {y,y'} C E are exclusive. 

Definition 19 (Consistent Exclusivity ([H, (15, UU 47, jj|)). A probability 
weight w : Y —> [0,1] satisfies Consistent Exclusivity (CE) iff one has 

^2w(y)<l (40) 

j/GE 

for every mutually exclusive set EC Y. 

The above formulation of CE is “device-independent”, in that it makes ref¬ 
erence to the probability weight w, but not to the specific set of measurements 
that generate w. Nevertheless, the request that a probability weight 
satisfies CE is hard to motivate on physical grounds 0 To find such motivation, 
we argue that one should look outside the device-independent framework. This 
point will be discussed in section fl3l 


17]- 


ll.f. The CE hierarchy 

Like LO, CE can be generalized to an infinite hierarchy of constraints 
The fc-th level of the hierarchy is defined by considering k identical copies of a 
black box, the i-th copy generating an answer y,; with probability weight w(yi). 
In this setting, the string of answers y = (3/1,3/2, - - - , J/fc) £ Y xfc has probability 
weight 

w ®fc( y ) := w ( yi ) w (y 2 ) ... w (y k ) 


and the fc-th level of the hierarchy is defined as follows: 

Definition 20 (Consistent Exclusivity at the fc-th level eih .). A probability 
weight w(y ) satisfies CE at the k-th level iff one has 

E < 1 ( 4l ) 

y£E fc 

for every mutually exclusive set E k C Y xfe . 


20 Ref. [Tsll reports the following comment by Simon Kochen on the background role of CE 
in the Kochen-Specker paper: “Ernst and I spent many hours discussing the principle. [...] 
The difficulty lays in trying to justify it on general physical grounds, without already assuming 
the Hilbert space formalism of quantum mechanics.”. 
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11.5. Characterizing the degree of contextuality of projective quantum measure¬ 
ments 

Since the pioneering work of Kochen and Specker, projective measurements 
have played a privileged role in the study of contextuality in quantum mechanics 
(see Spekkens 14] for a discussion). Following this tradition, a number of recent 
works ([§8|,[l3, 3) have attempted a device-independent characterization of the 
input-output probability distributions of the form 

p(y\x) = Tr[P x p] , (42) 


where 

1 . p is a quantum state, and 

2. for every x G X, P x := {Pf} y£Y x is a projective quantum measurement 
satisfying the non-contextuality condition 

Py=Py' Vs, a' ex, Vy eY x n Y x f. (43) 

We call an input-output probability distribution of the form (l42l) projective 
quantum (PQ). 

It is not hard to see that, for given input/output alphabets, the set of PQ 
input-output distributions is convex. Hence, characterizing it is equivalent to 
characterizing the maximum payoffs achievable in all possible contextual games. 
Since the maximum payoff is an indicator of the degree of contextuality, we re¬ 
fer to the problem as “characterizing the degree of contextuality of projective 
quantum measurements”. Just like in the case of nonlocality, finding a device¬ 
independent characterization is a spectacularly hard problem. CE provides 
remarkable results in this direction, but provenly [T3] not a complete character¬ 
ization. 


12. Physical implementation of contextual games 

Like in the case of non-locality, the framework of operational-probabilistic 
theories can be applied to the study of contextual games. For the physical 
implementation of a given contextual game, we propose the following model: 

Definition 21. A physical implementation of a contextual game is a protocol 
where 

1. the referee sends to the player a physical system S, prepared in a state p, 
and a classical input itX, chosen at random with probability q(x) 

2. The player performs a measurement m x = y x on system S and 

communicates to the referee the measurement outcome y 

3. The referee assigns the payoff u(x,y) to the answer. 

In the implementation of the protocol, the player’s measurements are subject to 
the following constraints 
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1 . they should satisfy the effect non-contextuality condition 

niy = rriy Vx, x' G X , My G Y* n Y x > . (44) 

2 . they have to be performed on the input system provided by the referee, (not 
on some other system of the same type prepared in the player's laboratory). 
In other words, the conditional probability distribution of the answer y 
must satisfy 

p(y\x) = (m*\ p) . (45) 


The above physical implementation departs radically from the device-independent 
scenario. This can be observed in the following points: 


800 


1. While the original game had only a classical input x and a classical output 
y, its physical implementation involves also the communication of a specific 
physical system S, known to the player. 

2. The effect non-contextuality condition (l44l) is device-dependent. In order 
to check its validity, one needs to make a full tomography of the measure¬ 
ment devices {m 1 } ie x. Indeed, effect non-contextuality is a stronger con¬ 
dition than response non-contextuality (1361) : it is equivalent to response 
non-contextuality for every possible state p G Sti(5). 

3. Imagining that the game is played a large number of times, the effect 
non-contextuality condition can be enforced by the referee by randomly 
switching from the “game-playing mode” to a “constraint-checking mode”, 
which consists in sending, instead of the state p , a state chosen at ran¬ 
dom from a tomographically complete set of states. By collecting enough 
statistics, the referee will be able to identify (up to statistical errors) the 
measurements {nFj^gx and to check (up to statistical errors) whether 
they satisfy the effect non-contextuality condition. 

4. The constraint that the player’s measurement are performed on the state 
p can also be checked once a tomographic estimate of the measurements 
{m 1 }^ is available. For this purpose, the referee only needs to com¬ 
pare the empirical distribution of the player’s answers with the desired 
distribution p(y\x) = (m y | p). 

The physical implementation of a contextual game can also be phrased in 
graph theoretic terms. Given the hypergraph H = (X,Y) associated to the 
original game, the player’s strategy is completely specified by the function w : 
Y -> £11(5) defined by 


w(y) := Vx G X : y G x (46) 

[recall that the label x is identified with the subset Y x C Y], We refer to the 
function w{y) as an effect-valued weight on the hypergraph H: 

Definition 22 . A function w : Y —> Eff(5) is an effect-valued weight on the 
hypergraph H = (Y, X) iff the collection of effects {w(y)} y£x is a measurement 
for every x G X. 
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Given an effect-value weight w(y) and a state /?, one obtains a probability 
weight w(y), defined as 


w(y) := {w{y)\p) Vy € Y. (47) 

Once a physical theory has been specified, the goal of the player is to find the 
best measurements that maximize her expected payoff u>. Among all possible 
physical implementations with a system S and a state p € Sti ( S ), it is interesting 
to consider the ones that lead to the highest payoff. For a given theory T, we 
denote by ujj the maximum payoff that can be obtained by optimizing over all 
systems, states, and measurements. 


13. Reformulating Consistent Exclusivity as a (device-dependent) phys¬ 
ical principle 

In its original formulation, CE is a principle about probability weights. To 
interpret it as a physical principle, one needs to specify what physical situa¬ 
tions give rise to probability weights satisfying Eq. (HDD - This specification, 
however, is far from straightforward. The naive formulation “All the probabil¬ 
ity weights arising in Nature satisfy CE” is ultimately wrong, since one can 
easily construct examples of quantum measurements giving rise to probability 
weights violating the CE property: in fact, for every contextual game, the max¬ 
imum payoff achievable in quantum mechanics is equal to the maximum payoff 
achievable with arbitrary response-non-contextual strategies^! , namely 

^quantum = wrnc ■ (48) 

The fix for this problem is to restrict the validity of CE to probability weights 
generated by projective measurements: the correct condition satisfied by quan¬ 
tum mechanics is “All the probability weights arising from projective measure¬ 
ments satisfy the CE property”. Note that this is by no means a device¬ 
independent statement, as it refers explicitly to a property of the devices used 
to generate the probability weight. 

In order to formulate CE as physical principle, one has first to define the 
analogue of the “projective measurements”. This can be done in different ways, 
depending on which aspect of projective quantum measurements is chosen as 
distinctive. Every definition will lead to a different “CE principle”, potentially 
encompassing a different picture of the physical world. For example, in Ref. [1[ 
we proposed a formulation of the CE principle in terms of sharp measurements: 


21 Trivially, every probability weight w(y) defines a set of quantum measurements, the x-th 
measurement described by the POVM P 1 := {Pj )y(?T. defined by Pj := w{y) Is, where Is is 
the identity operator on the system’s Hilbert space. For every system and for every density 
matrix p one then has p{y\x) = Tr [ Pj p] = w[y). No matter what the system’s dimensionality 
is, the maximum of the payoff over all quantum measurements coincides with the maximum 
over all response-non-contextual strategies. 
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Definition 23 (SharpCE). A theory satisfies SharpCE (at the fc-th level of the 
hierarchy) iff every probability weight generated by sharp measurements accord¬ 
ing to Eg. satisfies CE (at the k-th level of the hierarchy). 

This formulation of the CE principle has been adopted by Cabello in Refs. 
|l07l ll08l ] , as capturing the intuition at the basis of the formulation of CE in 
the graph-theoretic framework. We now explore an alternative formulation, in 
terms of spiky measurements: 

Definition 24 (SpikyCE). A theory satisfies SpikyCE (at the fc-th level of the 
hierarchy) iff every probability weight generated by spiky measurements according 
to Eq. 7| ) satisfies CE (at the k-th level of the hierarchy). 

Interesting, the two formulations turn out to be equivalent if we restrict 
our attention to pure measurements, because in this case “sharp” and “spiky” 
are equivalent notions. The equivalence is discussed in Section 1151 which also 
lists other alternative generalizations of the notion of projective measurement 
in quantum theory. 


14. Deriving Consistent Exclusivity for Spiky Measurements 

Here we provide a derivation of SpikyCE from the following three principles: 

1. Causality 

2. Strong No Disturbance Without Information 

3. Pure State Identification, 

the last of which will be defined precisely later in this section. 


lf.l. Reduction to Coexistence of Mutually Exclusive Spiky Effects 

Our derivation of SpikyCE proceeds through a sequence of reductions. The 
first reduction is based on the following notions: 

Definition 25 (Mutually exclusive effects). Two effects m and ml are exclusive 
iff there exists a measurement m such that C m. A set of effects 

{ m y}y£ e are mutually exclusive iff every pair of effects {m, m'} C {m y } y ^ e are 
exclusive. 


Definition 26 (SpikyCMEE). A theory satisfies Coexistence of Mutually Ex¬ 
clusive Spiky Effects (SpikyCMEE) iff every set of mutually exclusive spiky 
effects can coexist in a measurement. 


SpikyCMEE coincides with the formulation of the CE principle used by 
Barnurn, Muller, 


and Ududec in Ref. 
by the following observation 


78]. Their choice of name was motivated 


Proposition 3. SpikyCMEE implies SpikyCE. 

The proof is elementary and can be found in | Appendix E[ 


38 






14-2. Deriving SpikyCMEE from Sufficient Orthogonality and Pure State Iden¬ 
tification 

We now reduce SpikyCMEE to Sufficient Orthogonality combined with a 
principle of Pure State Identification. In order to phrase the latter, we need the 
following definitions: 

Definition 27. An effect m £ Eff(S') is normalized iff there exists a state 
p £ St(5) such that ( m\p ) = 1. 

Definition 28. Let m and ip be an effect and a pure state of system S, respec¬ 
tively. We say that m identifies <p iff 

1 . ( m\<p ) = 1 

2 . (m\p) < 1 for every state p p. 

For example, in Quantum Theory every rank-one projector, considered as a 
measurement effect, identifies a pure state. In a general theory, the fact that 
every normalized pure effect identifies a state is a nontrivial property We 
refer to it as Pure State Identificatioi¥% 

Axiom 4 (Pure State Identification). A theory satisfies Pure State Identifica¬ 
tion (PSI) iff every normalized pure effect identifies a pure state. 

In a general theory, one has the following 

Proposition 4. Pure State Identification and Sufficient Orthogonality imply 
SpikyCMEE. 

Proof. We first prove SpikyCMEE for pure effects. Let {a,}^ be a set of 
mutually exclusive pure effects. By PSI, each pure effect identifies a pure 
state pi. Since the effects are mutually exclusive, for every pair {a,;,aj} there 
exists a measurement m 4 - 7 such that {ai,aj} C m 4 - 7 . Hence, the condition 
ifliVPi) = 1 i m pli es ( a i\Pj) = $ij- Since i and j are arbitrary, this means that 
the effects {dj}^ are orthogonal. SO then implies that the effects {ai}^L 1 
coexist in a measurement. This argument proves the validity SpikyCMEE for 
pure effects. The extension to arbitrary spiky measurements is immediate, since 
spiky measurements are coarse-graining of pure orthogonal measurements. □ 


22 Think, e. g. of the square bit, where every pure effect happens with probability f on all 
the states on one of the four sides of the square (cf. example [TJ. 

23 A very similar axiom appeared in Hardy’s 2011 axiomatization [ 2 H . I109H . under the name 
Logical Sharpness. Hardy’s axiom is slightly stronger than Pure State Identification, in that 
it requires that every pure state is identified by some pure effect. Another, closely related 
axiom was put forward by Wilce l2§t . who considered a privileged set of measurements with 
the property that each outcome identifies a pure state. The privileged measurements are not 
assumed to be pure from the start, but turn out to be so as a consequence of the axioms. 
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14-3. Derivation of SpikyCE 

Combining proposition H] with theorem [3] we get the desired result: 

Theorem 4. If a theory satisfies Causality, Strong No Disturbance Without 
Information, and Pure State Identification, then it also satisfies SpikyCE. 

900 A derivation of SpikyCE from completely different axioms is provided in Ref. 
[78], where SpikyCE is obtained from the requirement that i) every state can be 
represented as a mixture of perfectly distinguishable pure states and ii) all sets 
of perfectly distinguishable pure states of a given cardinality can be transformed 
into one another by reversible transformations. It is also interesting to compare 
Proposition 0] with the results of Ref. [l], where we formulated SharpCE and 
derived it from a single axiom about sharp measurements. Yet another way 
of deriving CE was found by Wilce 111011 . who interestingly obtained it from a 
requirement about bipartite systems |fj and from a requirement about coarse- 
graining of tests, very similar in spirit to the axiom used in [lj. The existence of 
different, alternative ways to obtain the CE principle provides a good illustration 
of the fact that a device-independent feature can arise from different features of 
the underlying physical theory. 

lf.f. Derivation of SpikyCE at all levels of the hierarchy 

Deriving SpikyCE at higher levels of the hierarchy is easy if we assume the 
Locality of Pure Orthogonal Measurements. This principle guarantees that, for 
every pure orthogonal measurement a x := {a x } y ^y x , the product measurement 
with effects 

a y : = a m ® < ® ' •' ® a m 

is also a pure orthogonal measurement. The fc-th level of the hierarchy just 
follows from the application of the SpikyCMEE. In summary, we have proven 
the following 

Corollary 3. If a theory satisfies Causality, Locality of Pure Orthogonal Mea¬ 
surements, Strong No Disturbance Without Information, and Pure State Iden¬ 
tification, then it also satisfies SpikyCE at all levels of the hierarchy. 

15. Different generalizations of the notion of projective quantum mea¬ 
surement 

We have seen that CE, as a physical principle, can be formulated in different 
was, depending on how the notion of “projective quantum measurement” is gen¬ 
eralized to arbitrary physical theories. In this section we discuss four different 
generalizations and establish a number of relations between them. 


24 More specifically, Wilce requires the existence of a conjugate system, in the sense of [28]. 
Roughly speaking, a system S is said to have a conjugate S if there exists a suitable state of 
S <8 S that exhibits perfect correlations for all measurements in a suitable class of privileged 
measurements, which we can identify e. g. with the spiky measurements of this article, or with 
the sharp measurements of Ref. jlj. 
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15.1. Sharp measurements 

Definition 29 (Sharp measurement [lj ). A measurement m = {m x } x£ x is 
sharp iff it can be implemented by a repeatable and minimally disturbing test 
T = {T x } X £x, i■ e. a test satisfying the repeatability condition 

(' m x \T x = (m x \ Vx G X (49) 

and the minimal disturbance condition 

(n v \ = (n y \ VyeY, (50) 

Vxex / 

for every measurement n = {n y } y£ y that is comvatiblF\. 

An equivalent characterization of sharp measurements is provided by the 
following 

Proposition 5. A measurement m is sharp iff there exists a test T such that 


(rixyl'Tx — (t 


■xy\ 


Vx, G X, Vy G Y 


(51) 


for every measurement {n xy \ x ^x,yey that refines m, i.e. 'f2 y ri X y = m x . 

The proof can be found in section III of [lj]. Eq. (1511) is closely related to the 


notion of coherent Luders rule introduced by Kleinmann in Ref. Ill], Roughly 
speaking, the sharp measurements of definition [29] are the measurements that 
can be implemented by tests in which each transformation is a coherent Liiders 
rule for the corresponding effect @- 


25 We recall that two measurements m = {m x } x 6 x and n = {n y } y ^y are said to be compat- 
ible iff there exists a third measurement o = {o z } z ^z an d two disjoint partitions of Z, denoted 
by and respectively, such that 


E °- 

Vx G X 

zezg 1 2 


E 

Vs/ GY 

,ezn 



26 Some care is required with such an identificat ion, which sometimes turns out to be incor¬ 
rect. The main differences between Refs. [J] and |lll| can be summarized as follows: 

1. Framework. Ref. [lll| associates to a physical system S an order unit vector space 
Vs, making the following 

Assumption 3. Eff(S') = {m E Vs | 0 < ra < us}- 

Not every OPT satisfies such an assumption, which is strictly stronger than convexity 
of the space of effects and is partly related to the so-called No-Restriction Hypothesis 
[37l . l9lll (see [Appendix F| for details). 

2. Positive maps vs physical transformations. A coherent Luders rule (CLR) is defined 
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15.2. Maximally discriminating measurements 

A third generalization of projective quantum measurements appeared often 

22 


in the literature on the reconstructions of quantum theory 24, 13, 

In this context it is often convenient to consider measurements that distinguish 
perfectly among a maximal set of states, i. e. sets of states {p n }n—i with the 
property that there is no state Pn+i such that the states {p n }n=i are jointly 
distinguishable. Here we call a measurement that distinguish among a maximal 
set of states a maximally discriminating measurement. In the case of quantum 
theory it is easy to see that the maximally discriminating measurements coincide 
with the projective measurements. 


as a positive linear map </> : Vs —> Vs satisfying the conditions 

< p(us ) = m (m— compatibility) (52) 

4>(n) = n Vn E Vs : 0 < n < m (coherence) . (53) 

Each positive map is regarded as a potential candidate for a physical transformation, 
leaving the actual choice of physical transformations open. We argue that the most 
sensible way to make such a choice is to start from a full OPT, where the composition 
of transformations in parallel and sequence is built in the operational structure, thanks 
to the adoption of the categorical framework |6ll [62l , |32|] . This allows one to bypass 
problems like the difference between positivity and complete positivity, and the problem 
that the correspondence between positive maps and physical transformations may not 
be uniquely defined if the axiom of Local Tomography is not satisfied 0,111. 

3. Sharpness vs coherence. The sharpness condition ED is generally not equivalent to 
the coherence condition 453 }. The two conditions become equivalent under the validity 
of the following 

Assumption 4. Every two effects m,n E Vs satisfying n < m are compatible, that 
is, the three effects n,m — n and us — m can coexist in a measurement allowed by the 
theory. 

Assumption 0] holds for theories satisfying the Purification axiom (se e Corollary 36 of 
Ref. [33) and for theories with a Jordan-algebraic structure [28 1 |7q| . Sharpness and 
coherence are potentially different notions for OPTs that do not satisfy assumption 0] 

4. Effects vs measurements. While Ref. [lj focusses on measurements, Ref. |lll| focusses 
on individual effects. As a result, an effect with a CLR may not be a sharp effect 
(i. e. an effect belonging to a sharp measurement): indeed, an effect m c an have a CLR 
even if there exists no CLR for the complementary effect us — m EH- 

We have seen that defining a privileged set of measurements is important for the study of con- 
textuality. Hence, one may want define measurements starting from CLRs. There is a tricky 
issue here: the most obvious definition “CLR measurement := measurement m = {m x } x eX 
where each effect has a CLR rule <f) x n does not have a clear operational meaning, because the 
collection of maps {< fix} x ex may not correspond to any test allowed by the theory. For this 
reason, we suggest to define “CR measurement := measurement that can be implemented by a 
test T' = {' T x } X £x wherein each transformation induces a CLR for the corresponding effect.” 
Adopting this definition, we have the following 

Proposition 6. Sharp measurements coincide with CLR measurements in causal OPTs 
satisfying Assumvtions[3\ and \f\ 

The proof follows from the discussion presented in | Appendix F[ which also contains more 
details on the relations between sharp measurements and CLR measurements. 
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15.3. Measurements consisting of extremal effects 

Yet another possible generalization of projective quantum measurement is 
the one in terms of measurements that consist of extremal effects [98|, i.e. effects 
that are extreme points of the set of effects associated to a given system^. Note 
the difference between extremal effects and the notion of pure effects used in 
this paper: in quantum theory, the effect p| 0 )( 0 | is pure in our sense, but is 
not extremal in the convex set of effects, because it is a mixture of the effect 
|0)(0| with the zero effect. On the other hand, the projector |0)(0| + |1}(1| is an 
extremal effect, but is not pure, because it can be obtained by coarse-graining 
the pure effects |0)(0| and |1)(1|. In general, it is easy to see that the extremal 
effects in quantum theory are the projectors, while the pure effects are the 
rank-one positive operators upper bounded by the identity matrix. 


15-4- Relations among the different definitions 

A summary of the possible generalizations of projective quantum measure¬ 
ment is as follows: 

1. Maximally discriminating measurements. Definition based on the notion 
of distinguishability of a maximal set of states. 

2. Spiky measurements. Definition based on purity and orthogonality. 

3. Sharp measurements. Definition based on the dynamical features of the 
measurement process, which is required to be repeatable and minimally 
disturbing 

4. Measurements consisting of extremal effects. Definition based on the con¬ 
vex structure of the set of effects. 


Not much is known about the relations among these four definitions, except 
for a few observations that one can readily make. First, if a pure measurement 
is maximal, one has the following implications: 


Proposition 7. Let m be a pure measurement in a causal theory. Then, 

1 . if m is maximal, then it is also spiky 

2 . m is spiky iff it is sharp 

3. if m is spiky, then it consists of extremal effects. 


The proof is presented in Appendix G 


One interesting question here is 
under which conditions the implication 1 can be reversed, i.e. under which 
conditions a pure spiky measurement is maximal. Here is a possible answer: 
Suppose that 


1. the theory satisfies Pure State Identification, and 

2 . the set of effects that give probability 1 on a given state has a non-trivial 
lower bound, that is, for every system S and for every state p £ St(S'), 
there exists an effect a p £ Eff(5), a p ^ 0 such that a p <m for every effect 
to such that (to |p) = 1 . 


27 This definition presupposes the mild assumption that such effects form a convex set. 
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These two conditions are sufficient to guarantee that all spiky pure measure¬ 
ments are maximal: 

Proposition 8. In a causal theory satisfying Conditions 1-2 every spiky pure 
measurement is maximal. 

The proof is provided in the | Appendix H| It is worth stressing that the sim¬ 
ple equivalences presented here hold for pure measurements, while the situation 
is much more involved for generic measurements. This fact prevents a direct 
comparison of the results of this paper with those of Ref. f]J , where the main 
arguments were based on the properties of non-pure sharp measurements. 

16. Conclusions 

In this paper we reviewed the device-independent framework for nonlocal- 
iooo ity/contextuality and the framework of general probabilistic theories, with the 
aim of bridging the gap between the two approaches. We see a high payoff in 
the transfer of results from one paradigm to the other. From the point of view of 
quantum axiomatizations, being able to reconstruct a device-independent prin¬ 
ciple from the axioms provides a direct access to many fundamental features 
of quantum nonlocality and contextuality. From the point of view of quantum 
nonlocality/contextuality, the approach of general probabilistic theories offers 
the possibility to find a deeper understanding of the device-independent fea¬ 
tures, which may help overcoming the current difficulties in finding a complete 
device-independent characterization. 

In this paper we explored both directions. Following Ref. [lj, we focussed 
on the principles of Local Orthogonality and Consitent Exclusivity and derived 
them from principles about the structure of the measurement process. The 
derivation presented here differs significantly from the one presented in Ref. [Tj, 
both in the requirements and in the notions used to formulate them. Essentially, 
the two papers investigate two different notions of “ideal measurement”, pro¬ 
viding two different and potentially inequivalent generalizations of the notion 
of projective measurement in quantum theory. The two generalizations, called 
sharp and spiky measurements, respectively, refer to different operational prop¬ 
erties of measurements: repeatability and minimal disturbance for the former, 
purity and orthogonality in the latter. 

How should we interpret the fact that the same device-independent features— 
LO and CE in this case—can be reduced to two different physical pictures? 
Several answers are possible: On the one hand, one could argue that correla¬ 
tions are only a partial aspect of a physical theory and that, in fact, it is even 
possible that two different physical theories lead to the same set of correlations. 
In this sense, it is no surprise that inequivalent sets of physical principles entail 
the same device-independent bounds. On the other hand, one could argue that 
the framework of general probabilistic theories is too general, in that it allows 
for more theories than those that are actually worth studying. In general, the 
inequivalence of two sets of axioms could be due to some artificial and unin¬ 
teresting counterexample. Inequivalent axioms could turn out to be equivalent 
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under some reasonable assumption—the only problem being that the right as¬ 
sumptions have yet to be pinpointed. As a matter of fact, we believe that both 
answers contain elements of truth. 

In the present work, a partial simplification was achieved at the level of pure 
measurements, where the difference between sharp and spiky measurements dis¬ 
appears. It is remarkable that, once more 37, Enm, bringing the analysis to 
the level of pure processes simplifies proofs and unites different notions. This 
fact could be taken as a clue that the core of Quantum Theory is encoded in the 
peculiar interaction between the operational level and an underlying world of 
pure processes. From this point of view, the most natural continuation of the re¬ 
search initiated in this paper is to combine the Measurement Purification axiom 
with the State Purification axiom of Ref. [37], seeking for a new axiomatization 
of Quantum Theory only in terms of Purification features. 
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Appendix A. Equivalence between causality and no-signalling 

Proof. The implication 1 =>■ 2 is an immediate consequence of the uniqueness 
of the deterministic effect. To prove the implication 2 =>• 1, let us assume 
that Uq and U\ are two deterministic effects for some system A. Consider a the 
following two-party scenario: 
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• Party 1 holds system A and party 2 holds system B. Systems A and B 
are prepared in some joint deterministic state a £ Sti (A <g> B) 

• Party 1 applies either the effect uq or the effect u\ depending on the value 
of her input x\ £ {0,1}. Since both measurements have a single outcome, 
in both cases the output y\ can take a single value, say y\ = 0 

• Party 2 has a single measurement setting, say x 2 = 0, which consists in 
performing a measurement {b y2 } y2( =y on her system, getting the outcome 
2 / 2 - 

Defining the probability distributions 

p(yi,V 2 \xi,x 2 ) ■= (u Xl ®b V2 \a), 
we have that the no-signalling condition becomes 

(«o <g> b V2 \a) = (m <8> b V2 \<j) Vy 2 £ Y , 


or, equivalently 


(uo\p V2 ) = (ui\p V2 ) Vy 2 £ Y , 


where p V2 is the state p V2 := (Ia < 8 > b V2 )a. Now, Assumption [T] guarantees that 
every state of system A is of the form p V2 = (Ia b V2 )cr for some suitable 
state a and some suitable measurement {b V2 }. Hence, uq and u\ give the same 
probability on every input state. By the very definition of effect (cf. paragraph 
ILF of Ref. |37(), this means uq = u\. □ 


Appendix B. Proof of proposition^ 

Proof. Let mo and m i be two effects such that (m,;|/9j) = 6ij for arbitrary 
i,j,£ {0,1}. Since every effect belongs to a measurement, there must exist a 
measurement {fh y } y& y such that m o = fh Vo for some outcome yo £ Y. By 
coarse-graining, the measurement {fh y } y& y can be turned into a binary mea¬ 
surement {mo,m-,o}) defined by 


m-, 0 := 


E 


m y 


y&y.y^vo 


By construction one has (mo|po) = (m-,o|jOi) = 1 and (mo\pi) = (m-,o|Po) = 0. 
In other words, the states p\ and po can be perfectly distinguished using the 

measurement (?no, m-,o}• d 
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Appendix C. Proof of lemma Q] 

Proof. For every setting a:, use the measurement purification axiom to represent 
the measurement m x as (m“ \ = (M*\[Z S ®\ a x )\ for some orthogonal measure¬ 
ment M x on S ® E x and for some state a x £ St (E x ), where E x is a suitable 
environment. Since there is a finite number of settings, one can always define 
E := ® xeX E x , a := ®, eX <J X and replace the measurement M a ’ with a new 
spiky measurement N x given by 


{N x v |:= (M *|0 


0 M 

X 1 dzX,x'^X 


where u x denotes a unit effect on system E x (note that, since Causality is not 
assumed in the hypothesis, the unit effect may not be unique). In this way, the 
probability distribution can be expressed as 

p{y\x) = ( m x y \p) 

= (M*\p®a x ) 

= {Ny\p® a) . 


□ 

Appendix D. Violation of SO for polygons with odd number of ver¬ 
tices 

To prove that the effects {a y ,a y ^n±i} cannot coexist in a measurement, we 
show that the sum of their probabilities on the state \<PyQi) exceed one. Indeed, 
define 


S (cLy\tp y Ql) + (oLm n + l \lfiyQl) ■ 


Then, Eq. ijTTjl yields 


27t\ 


cos — - r„ cos — +2 


n J 


3tt\ 


n J 


Now, the condition s > 1 is equivalent to 


f 2n\ 


( 37T 


C cos — - ri cos — - r~ + 1 > 0. 


\ n J 


V n 


Inserting the definition r n := y/1 /cos{ir / n) into this inequality, one obtains 


( 27r 


V n 


© 


cos — + cos — > cos — +1 


3tt\ 


n J 
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which is equivalent to 


2 cos 




> 2 



(D.l) 


uoo having used the relation cos a + cos /3 = 2 cos cos . Clearly, the inequal¬ 
ity (ID. II) is satisfied for every n > 5. Hence, all the polygons with n > 3 odd 
vertices violate SO. 


Appendix E. Proof of proposition [3] 

Proof. Let H = (Y, X) be the hypergraph associated to a given contextual game 
and let w : Y — > Eff(S') be the effect-valued weight describing the player’s 
strategy in a physical implementation of the game. Clearly, for every exclusive 
set of vertices E C Y (in the sense of definition m, the effects {w(y)} ye E are 
mutually exclusive (in the sense of definition l25l) . It is also clear that, if the 
mutually exclusive effects {w(y )} v ^e coexist in a single measurement, then one 
has the inequality 


^2(w{y)\p) < f, (E.l) 

ye e 

for every state p £ Sti (p). This means that every probability weight w generated 
by the effect-valued weight w satisfies CE. □ 


Appendix F. Sharp measurements and coherent Liiders rules 

In the following we provide a more detailed discussion of the difference be¬ 
tween sharp measurements and CLRs. 


Appendix F.l. Framework 


The framework of Ref. Ill ] differs from the OPT framework in a number 


of significant ways. Ref. 53 associates a physical system S with an order unit 
vector space (OUVS), which we denote by Vs- The unit in Vs corresponds to 
the deterministic effect us, and every positive element m £ Vs satisfying the 
condition m < us is assumed to be an effect, physically realizable in some test. 
In the language of OPTs, this amounts to the assumption 

Assumption 5. Eff(S) = {m £ Vs | 0 < m < us}. 

Such a condition means that 


1. system S has a unique deterministic effect 

2. the set of effects Eff(S) is convex 

3. effects can be “scaled up”: for every effect m £ Eff(S') and for every scaling 
coefficient A > 1 satisfying A m < us, one has that Am belongs to Eff(S'). 
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For theories that are not deterministic, the first two conditions can be opera¬ 
tionally motivated as part of the “Causality package”- in particular, see Corol¬ 
lary 5 of Ref. [37] for the convexity of Eff(S). Although Causality is a very 


natural assumption (one that we also wanted to make in this paper), it is worth 
noting that the operational definition of sharp measurement (definition 1291) in 
terms of repeatability and minimal disturbance can be applied even in exotic 
non-causal scenarios, like those arising in Refs. 93, SMESSzl. 

The condition that effects can be scaled up is more specific. It would follow 
if one assumed the No Restriction Hypothesis 13?], 91], which guarantees both 
the validity of Assumption (3]) and the fact that Eff(S) is the full dual cone 
associated to the set of states St(S). One way to motivate the No-Restriction 
Hyp othesis on operational grounds is provided by Barnum, Miiller, and Ududec 
78], who showed that the No-Restriction Hypothesis holds if i) all states of sys¬ 
tem S can be decomposed into convex combinations of perfectly distinguishable 
pure states and it) every two sets of of perfectly distinguishable pure states can 
be connected by a reversible transformation. Interesting, one way to bypass the 
No-Restriction hypothesis is to assume the Purification axiom 37], which also 
guaranteed that effects can be scaled up (for the proof, see Corollary 36 of Ref. 

Appendix F.2. Positive maps vs physical trans form ations 

A direct comparison between Refs, [l] and 111 can be made only for the 
restricted set of OPTs that satisfy the conditions 1-3. From now on we restrict 
our attention to such theories. 

Given an effect m G Vs, Kleinmann defines a coherent Liiders rule (CLR) 
for m as a positive linear map <j> : Vs —> Vs satisfying the conditions 

= m (to— compatibility) (F.l) 

c j>(n ) = n Vn £ Vs : 0 < n < m (coherence). (F.2) 

The map </> is interpreted as a potential candidate for a physical transforma¬ 
tion. Whether or not a given map <fr really represents a physical transformation, 
however, is another issue: the definition of CLR only refers to positive maps 
satisfying conditions ED and (IF.21) . 


In order to compare Refs. |l| and DU] we need to restrict our attention 
to those effects m that admit a CLR with the extr a pr operty that the map (f> 


represents a physical transformation. While Ref. [Ill ] does not specify how 
physical transformations are defined, for the sake of comparison we now as¬ 
sume that some choice has been made. In our opinion, the most sensible way 
to make such a choice is to start from a full OPT, where the composition of 
transformations in parallel and sequence is built in the operational structure 
(thanks to the adoption of the categorical framework by Abramsky and Coecke 
[Hll H, IH]). The advantage of using the categorical approach with respect to 
the single-system approach is that in this way one can bypass problems like 
1 . the difference between positivity and complete positivity (maps that send 
effects into effect for individual systems may not do so when applied locally 
on composite systems), and 
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2 . the fact that the action of a transformation on a composite system may 
not be uniquely determined by the linear map (f> (this is often the case 
when the axiom of Local Tomography is not satisfied [37l. l40|h 

Let us denote by Vg^ y | the set of positive maps induced by physical trans¬ 
formations, in the following sense: 

Definition 30 (cf. Eq. (22) of US)- The map (f> : Vs —> Vs is induced by the 
physical transformation T £ Transf(£ —> S ) iff for every effect m £ Vs one has 
4>{m) = m' where m! is the effect defined by (m'\ := (to|T. 

A physical CLR is one where the map <f> belongs to Vg_^a. 

Appendix F.3. Sharpness vs coherence 

Equipped with the definition of physical CLR, we can now compare the co¬ 
herence condition of Eq. (IF.21) with the sharpness condition of Eq. (15T1) . In 
general, these two condition express different operational requirements: Klein- 
mann’s coherence condition requires that </> do not disturb al the effects n that 
are “less likely to be triggered” than to, in the following sense 

Definition 31. The effect n is less likely to be triggered than m iff n < to. 

Our sharpness condition requires fact that </> do not disturb all the effects 
that are “compatible with” m. This means that the three effects 

TOi = n , TO 2 = TO — n , TO 3 = us — to 

coexist in a measurement. Such a condition is stronger than just n < to. Due 
to this fact, an effect to may satisfy the sharpness condition (ED, and still fail 
to satisfy the coherence condition (1531) . The two conditions become equivalent 
under the following 

Assumption 6 . Every two effects to, n £ Vs satisfying n < m are compatible. 

At present, we (the authors) only know that the assumption holds for theo¬ 
ries satisfying for theories satisfying the Purification axiom (for the latter, see 
again Corollary 36 of Ref. [37| ) and for theories with a Jordan-algebraic struc¬ 
ture mill]. 


Appendix F.f. Effects vs measurements. 

While we have so far we discussed about individual effects to, it is eventu¬ 
ally interesting to bring the comparison to the level of measurements. In this 
there is some ambiguity, since Ref. Em does not give an explicit definition 
of CLR measurement. One might be tempted to define it as a measurement 
m = (fBjJigx for which each every effect m x admits a physical CLR cf> x . This 
definition, however, does not have a clear operational meaning, because it is not 
a priori clear if the collection of maps {4> x }x&. is a measurement allowed by 
the theory. As per our present knowledge, such a condition is met by theories 
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satisfying Local Tomography and Purification, and possibly for some theories 
with Jordan algebraic structure. In general, the most reasonable approach is 
to define a CLR measurement m = {m x } x ^x as a measurement that can be 
implemented by test 7” = {7^} x gx for which each transformation induces a 
1200 CLR. When this definition is adopted, Proposition [G] follows from the discussion 
presented in the previous points. 

Appendix G. Proof of Proposition [3 

Proof. The implication 1 is an immediate consequence of the definitions: by 
definition, a maximal measurement is orthogonal. Hence, a pure maximal mea¬ 
surement is a pure orthogonal measurement. By definition, pure orthogonal 
measurement are a special case of spiky measurements (with “only one spike 
per outcome”). The equivalence between spiky and sharp measurements at 
point 2 is proven as follows: Since the refinements of a pure measurement are 
trivial, Proposition [5] implies that a pure measurement m = {m y } ye y is sharp 
iff there exists a test {M y } y( z y such that 

(m v \My = (m y \ VygY. (G.l) 

In other words, a pure measurement is sharp iff it is repeatable. 

We first prove the implication “m is spiky” ==> “m is sharp”. If m is spiky, 
then there exists a set of states {p y } such that (jn y \p y ') = S Vty >. Since the theory 
satisfies Assumption [2j we can define the measure-and-prepare test {M y } with 
A4 y := \p y )(m y \, which, by construction satisfies the condition (IG.ll) . Hence, 
we proved that m is sharp. Let us prove the converse implication “m is sharp” 
=>■ “m is spiky”. If m is sharp, one can take a state p such that (m y \p) > 0 
for every y. In a convect theory, one can always find one such state by mixing 
sufficiently many states in the state space of the system 0 Note that the 
possibility of mixing states is guaranteed by Assumption [2] Since m is a sharp 
measurement, there exists a test A4 = {M y }yGY such that (m y \ = {u\M y for 
every outcome y. Thus, one can define the state 

_ M y \p) 

V ' (my\p) ' 

By construction, one has ( m y \p y ) = 1 for every y , which, by the normalization 
of probabilities implies the orthogonality condition {m y \p y i) = S y y for every 
y,y'. This proves that m is a (pure) orthogonal measurement. Hence, m is be 
spiky. Finally, we prove the implication at point 3: “m is spiky” => “m is 
extremal”. For a given outcome y £ Y, suppose that 

m y = pn y + (1 - p) o y , (G.2) 


28 In the whole proof, this is the only point invoking convexity. 
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where p E (0,1) is a probability and n y and o y are two effects. Since m is 
orthogonal, there exists a state p y such that (■ m y \p y ) = 1. Hence, Eq. (IG.2I) 
implies the relation 1 = p ( n y \p y ) + (1 — p) ( o y \p y ) and, therefore 

(n v \p y ) = (o y \py) = l ■ (G.3) 

Since m y is pure, Eq. (IG.2I) implies n y = a y m y and o y = P y m y for two suitable 
constants a y > 0 and /3 y > 0. Using Eq. (IG.3j) one finally obtains 

1 = (n y \p y ) = a y ( m y \p v ) = a y 

and 

1 = ( °y\Py ) = Py ( m y\Px ) = Py ■ 

In conclusion, one has n y = o y = m y . This means that the effect m y is extremal. 
Since the outcome y is generic, we obtained that the whole measurement m is 
extremal. □ 


Appendix H. Proof of Proposition [8] 

Proof. We have to prove that a pure spiky measurement is maximal. To this 
purpose, observe that for pure measurements “spiky” is synonymous of “orthog¬ 
onal”. Now, suppose that a = {a y } ye y is a pure orthogonal measurement and 
let {p y } y e y be the set of states such that {a y \p' y ) = S y y. By the Pure State 
Identification Property, each state p y must be pure—let us denote it as p y =: ip y . 
Now, we prove that the set {<p y } is maximal. Indeed, suppose by absurd that 
the states {(p y } U p are perfectly distinguishable for some p, and let {m y } U m p 
the measurement that distinguishes among them. Since ( m y \ip y ) = 1, we must 
have m y > l y , where l y ^ 0 is the lower bound to the set of effects that have 
probability 1 on p y (the existence of the lower bound is guaranteed by Condi¬ 
tion 2). But since (a y \p y ) = 1, we must also have a y > l y . By the purity of a y , 
this implies l y = p y a y , for some probability p y > 0. This condition implies the 
relation 


K! P) =P y 1 (ly\p) 

Kp-'iniylp) 

= 0 Vy G Y. 

Moreover, since the effects {a y } y ^y form a measurement, we have (u\p) = 
12 y ( a y\p) = 0’ which implies that p is the zero state, p = 0. Hence, the set 
of pure states {p y } is maximal. □ 
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